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ABSTRACT 



The invention is embodied in a block adjustment method 
and apparatus which simultaneously aligns a set of overlap- 
ping images in order to construct an image mosaic. For each 
one of the images of the set, the invention performs the 
alignment by determining ray directions relative to a 
3 -dimensional coordinate system at plural predetermined 
pixel locations in the one image. For each one of the plural 
pixel locations in the one image, ray directions are deter- 
mined relative to the 3-dimensional coordinate system of the 
corresponding pixel location in each one of the other images 
overlapping the one predetermined pixel location of the one 
image. Then, incremental deformations of the overlapping 
images are computed which simultaneously minimize dif- 
ferences between the ray directions of plural pairs of the 
overlapping images which include the one image. The 
foregoing is performed for each of the plural predetermined 
pixel locations of the one image simultaneously. The images 
are warped in accordance with the incremental deformations 
and the process is repeated. 

79 Claims, 25 Drawing Sheets 
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BLOCK ADJUSTMENT METHOD AND 
APPARATUS FOR CONSTRUCTION OF 
IMAGE MOSAICS 

BACKGROUND OF THE INVENTION 

1. Technical Field 

The invention is related to computer systems for con- 
structing and rendering panoramic mosaic images from a 
sequence of still images, video images or scanned photo- 
graphic images or the like. 

2. Background Art 

Image -based rendering is a popular way to simulate a 
visually rich tele-presence or virtual reality experience. 
Instead of building and rendering a complete 3D model of 
the environment, a collection of images is used to render the 
scene while supporting virtual camera motion. For example, 
a single cylindrical image surrounding the viewer enables 
the user to pan and zoom inside an environment created from 
real images. More powerful image-based rendering systems 
can be built by adding a depth map to the image or by using 
a larger collection of images. 

The present invention is particularly directed to image- 
based rendering systems without any depth information, i.e., 
those which only support user panning, rotation, and zoom. 
Most of the commercial products based on this idea (such as 
QuickTime VR) use cylindrical images with a limited ver- 
tical field of view, although newer systems support full 
spherical maps (e.g., Interactive Pictures, and RealVR). A 
number of techniques have been developed for capturing 
panoramic images of real -world scenes. One way is to 
record an image onto a long film strip using a panoramic 
camera to directly capture a cylindrical panoramic image. 
Another way is to use a lens with a very large field of view 
such as a fisheye lens. Mirrored pyramids and parabolic 
mirrors can also be used to directly capture panoramic 
images. A less hardware-intensive method for constructing 
full view panoramas is to take many regular photographic or 
video images in order to cover the whole viewing space. 
These images must then be aligned and composited into 
complete panoramic images using an image mosaic or 
"stitching" method. Most stitching systems require a care- 
fully controlled camera motion (pure pan), and only produce 
cylindrical images. 

Cylindrical panoramas are commonly used because of 
their ease of construction. To build a cylindrical panorama, 
a sequence of images is taken by a camera mounted on a 
leveled tripod. If the camera focal length or field of view is 
known, each perspective image can be warped into cylin- 
drical coordinates. To build a cylindrical panorama, 3D 
world coordinates are mapped to 2D cylindrical screen 
coordinates using 



where 0 is the panning angle and v is the scanline. Similarly, 
3D world coordinates can be mapped into 2D spherical 
coordinates 6, (|> using 

0 = tan" 1 [X ID and 0 = tan" 1 (y/ >/ X 3 + Z 2 ). 

Once each input image has been warped, constructing the 
panoramic mosaics for a leveled camera undergoing a pure 
panning motion becomes a pure translation problem. Ideally, 
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to build a cylindrical or spherical panorama from a horizon- 
tal panning sequence, only the unknown panning angles 
need to be recovered. In practice, small vertical translations 
are needed to compensate for vertical jitter and optical twist. 
5 Therefore, both a horizontal translation t^ and a vertical 
translation l y are estimated for each input image. To recover 
the translational motion, the incremental translation ot^ot^., 
S^) is estimated by minimizing the intensity error between 
two images, I 0) l v 

10 

£(<5O=£[/iK+<50-/ 0 (*/)] 2 



15 where 

*rC*iOV) and jt'j— (je' A y' i )-(r J +r K jf/+jr > ) 

are corresponding points in the two images, and t^t^, t y ) is 
the global translational motion field which is the same for all 
20 pixels. After a first order Taylor series expansion, the above 
equation becomes 

25 

where e i «I 1 (x' 1 )-I 0 (x i ) is the current intensity or color error, 
and gf r =Vlj(x' 1 ) is the image gradient of l 1 at x\. This 
minimization problem has a simple least-squares solution, 

30 

35 To handle larger initial displacements, a hierarchical coarse - 
to -fine optimization scheme is used. To reduce discontinui- 
ties in intensity and color between the images being 
composited, a simple feathering process is applied in which 
the pixels in each image are weighted proportionally to their 

40 distance to the edge (or more precisely, their distance to the 
nearest invisible pixel). Once registration is finished, the 
ends are clipped (and optionally the top and bottom), and a 
single panoramic image is produced. 

Creating panoramas in cylindrical or spherical coordi- 

45 nates has several limitations. First, it can only handle the 
simple case of pure panning motion. Second, even though it 
is possible to convert an image to 2D spherical or cylindrical 
coordinates for a known tilting angle, ill-sampling at north 
pole and south pole causes big registration errors. (Note that 

50 cylindrical coordinates become undefined as you tilt your 
camera toward north or south pole.) Third, it requires 
knowing the focal length (or equivalently, field of view), 
While focal length can be carefully calibrated in the lab, 
estimating the focal length of the lens by registering two or 

55 more images using conventional techniques is not very 
accurate. 

The automatic construction of large, high- resolution 
image mosaics is an active area of research in the fields of 
photogrammetry, computer vision, image processing, and 

60 computer graphics. Image mosaics can be used for many 
different applications. The most traditional application is the 
construction of large aerial and satellite photographs from 
collections of images. More recent applications include 
scene stabilization and change detection, video compression 

65 and video indexing, increasing the field of view and reso- 
lution of a camera, and even simple photo editing. A 
particularly popular application is the emulation of tradi- 
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tional film-based panoramic photography with digital pan- of overlapping images, and using these estimates to warp 

oramic mosaics, for applications such as the construction of each input image so as to reduce the misregistration. This is 

virtual environments and virtual travel. In computer vision, less ambitious than actually recovering a perspective depth 

image mosaics are part of a larger recent trend, namely the value for each pixel, but has the advantage of being able to 

study of visual scene representations. The complete descrip- 5 simultaneously model other effects such as radial lens dis- 

tion of visual scenes and scene models often entails the tortions and small movements in the image, 

recovery of depth or parallax information as well. The invention is embodied in a block adjustment method 

In computer graphics, image mosaics play an important and apparatus which simultaneously aligns a set of overlap- 
role in the field of image -based rendering, which aims to ping images in order to construct an image mosaic. For each 
rapidly render photorealistic novel views from collections of 10 0 ne of the images of the set, the invention performs the 
real (or pre -rendered) images. For applications such as alignment by determining ray directions relative to a 
virtual travel and architectural walkthroughs, it is desirable 3-dimensional coordinate system at plural predetermined 
to have complete (full view) panoramas, i.e., mosaics which pixel locations in the one image. For each one of the plural 
cover the whole viewing sphere and hence allow the user to pixel locations in the one image, ray directions are deter- 
look in any direction. Unfortunately, most of the results to 15 mined relative to the 3-dimensional coordinate system of the 
date have been limited to cylindrical panoramas obtained corresponding pixel location in each one of the other images 
with cameras rotating on leveled tripods with carefully overlapping the one predetermined pixel location of the one 
designed stages adjusted to minimize motion parallax. This image. Then, incremental deformations of the overlapping 
has limited the users of mosaic building ("stitching") to images are computed which simultaneously minimize dif- 
researchers and professional photographers who can afford 20 ferences between the ray directions of plural pairs of the 
such specialized equipment. Present techniques are difficult overlapping images which include the one image. The 
because generally they require special camera equipment foregoing is performed for each of the plural predetermined 
which provides pure panning motion with no motion paral- pixel locations of the one image simultaneously. The images 
lax. are warped in accordance with the incremental deformations 
Problems to be Solved by the Invention: 25 and the process is repeated. 

It would be desirable for any user to be able to "paint" a In a pre ferred implementation, the one image is divided 

full view panoramic mosaic with a simple hand-held camera into plural palc hes and the plural predetermined pixel loca- 

or camcorder. However, this requires several problems to be ^ons constitute one pixel in each of the plural patches, 

overcome. Typically, the one pixel in each of the patches comprises a 

First, cylindrical or spherical coordinates should be 30 cea ter pixel of the patch. The invention is independent of the 

avoided for constructing the mosaic, since these represen- jy pe of deformations employed. Thus the deformations 

tations introduce singularities near the poles of the viewing could be defined by incremental 3-dimensional rotations 

sphere. relative to the coordinate system or by planar perspective 

Second, accumulated misregistration errors need to be transformations for example, 

corrected, which are always present in any large image 35 ^ may be deflne d by 3Klimensional 

mosaic. For examples one renters a sequence of images rota(joDS rela , ive tQ ^ t6mor b lanar 

using pairwae alignments, there is .usually a gap between the erspectiv6 transformations. Each of the ray directions is a 

last image and the first one even if these two images are the extending from an origin of the coordinate system to the 

same^A simple 'gap closing technique is introduced in the corresponding pixel location . Each of the ray directions is 

specification below which forces the first and last image to 40 defined re , atjve tQ one 0f al , of (&) a ^ 

be the same and distributes the resulting corrections across i(i (b) a 3 . dimensional rotational trans formation rela- 

the .mage sequence. Unfortunately, this approach works £ ye , 0 ^ ^ ordMate ltm> and (c) an ^ focal length . 

only for pure panning motions with uniform motion steps, a computation of the incremental deformation may involve an 

significant limitation. incremental rotation or both an incremental rotation and an 

TTnrd, any delations from the pure parallax-free motion 45 incremental ch in focal le th . 

model or ideal pinhole (perspective) camera model may , . . „ . ... , ,. , ,. . . 

result in local misregistrations, which are visible as a loss of k Inmal| y> a ° imt,al s f? u " ua , 1 ah S nmen 1 t of lh ' lma f s ^ 

detail or multiple images (ghosting). be , "> establish a ^formation of each image 

r & v relating the respective image to the coordinate system. This 

SUMMARY OF THE INVENTION 5o transformation is then used in the block adjustment align- 

The specification discloses how to avoid using cylindrical ment process, 
or spherical coordinates for constructing the mosaic by It is preferable to determine which pairs of the images 
associating a rotation matrix (and optionally a focal length) significantly overlap, and then discard other pairs of the 
with each input image, and performing registration in the images from the set of overlapping images. For example, 
input image's coordinate system. Such mosaics are referred 55 other pairs of the images are discarded from the set unless 
to herein as rotational mosaics. The system can use a at least a predetermined fraction of the pixels thereof over- 
postprocessing stage to project such mosaics onto a conve- lap. 

nient viewing surface, i.e., to create an environment map The invention can be used to create full view panoramic 

represented as a texture-mapped polyhedron surrounding the mosaics from image sequences. Unlike current panoramic 

origin. 60 stitching methods, which usually require pure horizontal 

The block adjustment method and apparatus claimed in camera panning, the disclosed system does not require any 

the present application solves the problem of accumulated controlled motions or constraints on how the images are 

misregistration errors, to find the optimal overall registra- taken (as long as there is no strong motion parallax). For 

tion. example, images taken from a hand-held digital camera can 

The specification discloses how to solve the problem of 65 be stitched seamlessly into panoramic mosaics, 

loss of detail or image ghosting is solved by computing local By taking as many images as needed, image mosaics can 

motion estimates (block-based optical flow) between pairs be constructed which cover as large a field of view as 



06/17/2004, EAST Version: 1.4.1 



5,987,164 

5 6 

desired, e.g., a complete environment map. Because the FIG. 17 illustrates pair-wise adjustment of a bundle of 

image mosaics is represented in the invention using a set of direction rays in accordance with the preferred embodiment 

transforms, there are no singularity problems such as those (corresponding to Equation 36 below) of the global block 

existing at the top and bottom of cylindrical or spherical adjustment method. 

maps. This method is fast and robust because it directly 5 FIG. 18 is a flow diagram illustrating the global block 

recovers 3D rotations instead of general 8 parameter planar adjustment method. 

perspective transforms. By mapping the mosaic onto an FIG. 19 illustrates pair-wise motion estimation between 

arbitrary texture-mapped polyhedron surrounding the origin, ixels and ayerage motion computat ion in accordance with 

the virtual environment can be viewed or explored using a degnost i n g method. 

standard 3D graphics viewers and hardware without requir- 10 ___ _ A . _ ' c j u j- 

ing special-purpose players. Once a mosaic has been FIG 20 is a flow diagram illustrating a preferred embod!- 

constructed, it can, of course, be mapped into cylindrical or ment of the de g hostin g metbod - 

spherical coordinates, and displayed using a special purpose FIGS. 21 and 22 illustrate complementary warpings of 

viewer. Such specialized representations are not necessary, each image in an overlapping pair of images in accordance 

and represent just a particular choice of geometry and is with the deghosting method of FIG. 20. 

texture coordinate embedding. Instead, using a mapping FIG. 23 illustrates the averaging of plural pair- wise 

process described herein, the mosaic can be converted to an motion estimates in accordance with one aspect of the 

environment map, i.e., by mapping the mosaic onto any deghosting method. 

texture -mapped polyhedron surrounding the origin. This FIG. 24 illustrates an array of patch -based motion esti- 

allows the use of standard 3D graphics APIs and 3D model 20 mates in accordance with the deghosting method, 

formats, and allows the use of 3D graphics accelerators for FIG 25A illustrates an inverse mapping method in accor- 

texture mapping. The mapping process can be employed in dance w j tn a preferred embodiment of the deghosting 

a rendering process that can be just as fast as special-purpose method in which per-patch motion estimates are obtained, 

viewers. Furthermore the mapping process can take advan- piG 25B illustrates how the per .p atc h motion estimates 

tagc of 3D graphics (rendering) hardware, support multi- 25 afe int olated t0 produce a finer &id 0 f pe r-pixel motion 

resolution textures, and allow for easy integration with estimates 

regular 3D graphics. FIG. 26 is a signal flow diagram corresponding to one 

BRIEF DESCRIPTION OF THE DRAWINGS embodiment for carrying out the deghosting method. 

30 FIGS. 27 and 28 illustrate a cubic 3D model and corre- 

FIG. 1 is a block diagram illustrating the operation of a sponding 2D texture map for carrying out one embodiment 

system embodying the invention. of a texture mapping me thod. 

FIGS. 2A and 2B is a block diagram illustrating apparatus FIG. 29 illustrates a mapping between triangular vertices 

embodying a system of the invention. of a 3D model and a 2D texture map employed in carrying 

FIG. 3 illustrates apparatus including a camera employed 35 out one embodiment of the texture mapping, 

for the acquisition of images to form a panoramic image. pj GS 30 and 31 illustrate a spherical 3D model and 

FIG. 4 is a detailed view of a portion of the apparatus of corresponding 2D texture map for carrying out one embodi- 

FIG. 3. ment of the texture mapping method. 

FIG. 5 illustrates a perspective warping and image. FIG. 32 is a flow diagram illustrating the texture map 

FIG. 6 illustrates a sequence of images obtained in an 40 computation process, 

incremental image alignment method. FIGS. 33-38 are panoramic images obtained using dif- 

FIG. 7 illustrates a pixel resampling problem encountered fe rent aspects of the present invention in accordance with 

in the incremental image alignment method. respective working examples. 

FIG. 8 is a flow diagram of the incremental alignment 45 DETAILED DESCRIPTION OF THE 

method. PREFERRED EMBODIMENTS 

FIG. 9 is a block diagram illustrating the signal flow in the INTRODUCTION: 

incremental alignment method of FIG. 8. This specification describes how to associate a rotation 

FIG. 10 is a flow diagram illustrating a coarse-to-fine matrix (and optionally a focal length) with each input image, 

resolution hierarchy employed in one embodiment. 50 rather than explicitly projecting all of the images onto a 

FIG. 11 is a flow diagram of the incremental 3D rotation common surface (e.g., a cylinder). In order to reduce accu- 

alignment method mulated registration errors, a global alignment (block 

FIG. 12 is a block diagram illustrating the signal flow in adjustment) is applied to the whole sequence of images, 

. . 1 ir» ; ** v « Z t u j f cm 11 which results m an optimal image mosaic (in the least- 

the incremental 3D rotation alignment method of FIG. 11. . _ , * ,, \ * 

. „ .„ . , , ,55 squares sense). To compensate for small amounts of motion 

FIG. 13 is a flow diagram illustrating a focal length ^ introduced b trans i ations of lhe camera and other 

estimation method which can be employed in carrying out umnodeled distortionS) a local alignment (deghosting) 

the rotation alignment method. method fa employed which warps each ^ge based 0Q tne 

FIG. 14 is a signal flow diagram illustrating a modifica- results of pa irwise local image registrations. Combining 

tion of the signal flow of FIG. 11 for carrying out a 60 5otD gj obal and local alignment significantly improves the 

patch-based alignment. quality of the image mosaics, thereby enabling the creation 

FIG. 15 illustrates adjustment of a bundle of direction rays 0 f full view panoramic mosaics with hand-held cameras, 

in accordance with a first approach to a global block The overall flow of processing in the system is illustrated 

alignment method. in FIG. 1. First, if the camera intrinsic parameters are 

FIG. 16 illustrates adjustment of a bundle of direction rays 65 unknown, the user creates a small mosaic using a planar 

in accordance with a second approach (corresponding to perspective motion model, from which a rough estimate of 

Equation 33 below) to the global block alignment method. the focal length is computed (block 110 of FIG. 1). Next, a 
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complete initial panoramic mosaic is assembled sequentially provide nonvolatile storage of computer readable 

by adding one image at a time and adjusting its position instructions, data structures, program modules and other 

using the rotational motion model (block 120 of FIG. 1). data for the personal computer 20. Although the exemplary 

Then, global alignment (block adjustment) is invoked to environment described herein employs a hard disk, a remov- 

modify each image's transformation (and focal length) such 5 able magnetic disk 29 and a removable optical disk 31, it 

that the global error across all possible overlapping image should be appreciated by those skilled in the art that other 

pairs is minimized (block 130 of FIG. 1). This stage also of computer readable media which can store data that 

removes any large inconsistencies in the mosaic, e.g., the 15 accessible by a computer such as magnetic cassettes flash 

"gaps" that might be present in a panoramic mosaic m ™*y cards > ^al vrdeo disks, Bernoul i cartridges, 

aLmbled using the sequential method. ITien, a local align- 10 acc f? ™ emories (RAM* ™ d only memories 

. /j u .* \ .u j * • i j . ^ i i (ROM), and the like, may also be used in the exemplary 

ment (deghosting) method is invoked to reduce any local v r J v J 

misregistration errors (block 140 of FIG. 1). The final operating environment. 

mosaic can be stored as a collection of images with asso- A number of program modules may be stored on the hard 

ciated transformations, or optionally converted into a disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25, 

texture-mapped polyhedron (environment map)(block 150 15 including an operating system 35, one or more application 

of FIG. 1). By mapping the mosaic onto an arbitrary programs 36, other program modules 37, and program data 

texture-mapped polyhedron surrounding the origin, the vir- 38 - A ^ ma y eater commands and information into the 

tual environment is viewed using standard 3D graphics P ersonal computer 20 through input devices such as a 

viewers and hardware (block 160) without requiring special keyboard 40 and pointing device 42. Other input devices 

purpose players. 20 ( not shown ) ma y include a microphone, joystick, game pad, 

satellite dish, scanner, or the like. These and other input 

Exemplary Operating Environment devices are often connected to the processing unit 21 

FIG. 2A and the following discussion are intended to through a serial port interface 46 that is coupled to the 

provide a brief, general description of a suitable computing system bus, but may be connected by other interfaces, such 

environment in which the invention may be implemented. 2 5 as a Parallel port, game port or a universal serial bus (USB). 

Although not required, the invention will be described in the A monitor 47 or other type of display device is also 

general context of computer-executable instructions, such as connected to the system bus 23 via an interface, such as a 

program modules, being executed by a personal computer. video adapter 48. In addition to the monitor, personal 

Generally, program modules include routines, programs, computers typically include other peripheral output devices 

objects, components, data structures, etc. that perform par- 30 ( not shown), such as speakers and printers, 

ticular tasks or implement particular abstract data types. The personal computer 20 may operate in a networked 

Moreover, those skilled in the art will appreciate that the environment using logical connections to one or more 

invention may be practiced with other computer system remote computers, such as a remote computer 49. The 

configurations, including hand-held devices, multiprocessor remote computer 49 may be another personal computer, a 

systems, microprocessor-based or programmable consumer 35 server, a router, a network PC, a peer device or other 

electronics, network PCs, minicomputers, mainframe common network node, and typically includes many or all of 

computers, and the like. The invention may also be practiced the elements described above relative to the personal com- 

in distributed computing environments where tasks are puter 20, although only a memory storage device 50 has 

performed by remote processing devices that are linked been illustrated in FIG. 2 A. The logical connections 

through a communications network. In a distributed com- 40 depicted in FIG. 2 A include a local area network (LAN) 51 

puting environment, program modules may be located both and a wide area network (WAN) 52. Such networking 

local and remote memory storage devices. environments are commonplace in offices, enterprise-wide 

With reference to FIG. 2A, an exemplary system for computer networks, intranets and Internet, 

implementing the invention includes a general purpose When used in a LAN networking environment, the per- 

computing device in the form of a conventional personal 45 sonal computer 20 is connected to the local network 51 

computer 20, including a processing unit 21, a system through a network interface or adapter 53. When used in a 

memory 22, and a system bus 23 that couples various system WAN networking environment, the personal computer 20 

components including the system memory to the processing typically includes a modem 54 or other means for establish- 

unit 21. The system bus 23 may be any of several types of ing communications over the wide area network 52, such as 

bus structures including a memory bus or memory 50 the Internet. The modem 54, which may be internal or 

controller, a peripheral bus, and a local bus using any of a external, is connected to the system bus 23 via the serial port 

variety of bus architectures. The system memory includes interface 46. In a networked environment, program modules 

read only memory (ROM) 24 and random access memory depicted relative to the personal computer 20, or portions 

(RAM) 25. A basic input/output system 26 (BIOS), contain- thereof, may be stored in the remote memory storage device, 

ing the basic routines that helps to transfer information 55 It will be appreciated that the network connections shown 

between elements within the personal computer 20, such as are exemplary and other means of establishing a communi- 

during start-up, is stored in ROM 24. The personal computer cations link between the computers may be used. 

20 further includes a hard disk drive 27 for reading from and The hardware or apparatus of FIG. 2A may be modified 

writing to a hard disk, not shown, a magnetic disk drive 28 to include additional features in order to carry out the present 

for reading from or writing to a removable magnetic disk 29, 60 invention, as illustrated in FIG. 2B. In FIG. 2B, a camera 

and an optical disk drive 30 for reading from or writing to 210 (such as a digital/electronic still or video camera or film 

a removable optical disk 31 such as a CD ROM or other or photographic scanner) captures a sequence of images 220 

optical media. The hard disk drive 27, magnetic disk drive which are then stored in an image memory 230. The method 

28, and optical disk drive 30 are connected to the system bus or processes of the invention are stored as a sequence of 

23 by a hard disk drive interface 32, a magnetic disk drive 65 program instructions in a program memory 240 for execu- 

interface 33, and an optical drive interface 34, respectively. tion by a microprocessor 250. The microprocessor 250, in 

The drives and their associated computer-readable media carrying out the instructions it fetches from the program 
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memory 240, fetches individual ones of the images stored in 
the image memory 230 and computes a transformation or 
camera matrix M for each image. The collection of camera 
matrices M generated by the microprocessor 250 are stored 
in an image warp memory 260. From these, the 5 
microprocessor, in carrying out other program instructions 
stored in the program memory 240, constructs an environ- 
ment model-based texture map which is stored in an 
environmenl/texture map memory 270. The microprocessor 
250 can then employ standard graphics techniques in taking 10 
the texture maps stored in the texture map memory 270 to 
create panoramic images which it displays in an image 
display device or medium 280. 

ALIGNMENT FRAMEWORK AND MOTION MODELS: 
Incremental Alignment 15 
In the invention, image mosaics are represented as col- 
lections of images with associated geometrical transforma- 
tions. The first stage of the mosaic construction method 
computes an initial estimate for the transformation associ- 
ated with each input image. This is done by processing each 
input image in turn, and finding the best alignment between 
this image and the mosaic constructed from all previous 
images. (To speed up this part, one option is to register with 
only the previous image in the sequence.) This reduces the 
problem to that of parametric motion estimation. A well- 
known hierarchical motion estimation framework is 
employed consisting of four parts: (i) pyramid construction, 
(ii) motion estimation, (iii) image warping, and (iv) coarse- 
to-fine refinement. An important element of this framework, 
which the present invention exploits, is to perform the 
motion estimation between the current new input image and 
a warped (resampled) version of the mosaic. This allows us 
to estimate only incremental deformations of images (or 
equivalently, instantaneous motion), which greatly simpli- 35 
fies the computations required in the gradient descent 
method. 

In the invention, the camera motion, unlike many prior art 
methods, is not at all restricted to a simple panning motion, 
but is permitted to be any suitable three-dimensional 40 
rotation, a significant advantage. However, for the sake of 
clarity of illustration, FIG. 3 illustrates a tutorial example in 
which the camera motion is restricted to a simple pure 
panning motion. Referring to FIGS. 3 and 4, a camera 310 
having its optical center fixed at point C (FIG. 3) captures a 45 
sequence of 2D still images I 0 , 1 1( I 2 , I 3 , . . . as it pans, the 
center rays of these images being focused on 3D points P 0 , 
P 1 , P 2 , P 3 , ... at a focal length f from the optical center point 
C. The points P ( - are defined relative to a fixed 3D world 
coordinate system ? x , ? y , ? z indicated in the drawings. Each 50 
2D image I has its own 2D x-y coordinate system whose 
origin (x,y)=(0»0) may be considered to be at the center of 
the image. Thus, for example, image I 0 may have coordi- 
nates x=(x,y) while image I a may have coordinates x'^x'.y'). 
The object is to register all of the images so that a panorama 55 
320 may be constructed. An example of how perspective 
transforms would warp one of the images 510 to register it 
with another image 520 with which it overlaps is illustrated 
in FIG. 5. FIG. 5 illustrates the "keystoning" effect of the 
warping of the image 510 required to produce the desired go 
registration between the two images. The problem is that the 
transformation (or the camera motion) between the two 
images I 0 and l x is unknown so that a transformation 
between x and x' must be inferred from the images them- 
selves before the images can be registered. 

In order to register the two images I 0 (x) and I^x 1 ), where 
x' is computed using some parametric motion model m, i.e., 
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x=f(x;m), the warped image is first computed in accordance 
with 



In the current implementation, bilinear pixel resampling is 
employed as illustrated in FIG, 7. The trick is then to find a 
deformation of ^(x) which brings it into closer registration 
with I 0 (x) and which can also be used to update the param- 
eter m. The warp/register/update loop can then be repeated. 
This specification describes how this can be done for two 
different transformation models, namely 8-parameter planar 
perspective transformations, and 3D rotations, and how this 
can be generalized to other motion models and parameters. 

Given two images taken from the same viewpoint (optical 
center) but in potentially different directions (and/or with 
different intrinsic parameters), the relationship between two 
overlapping images can be described by a planar perspective 
motion model. The planar perspective transformation warps 
an image into another using 



(2) 





mo 


m { 


m 2 ' 


X ' 




m 3 






y 






m 7 


m 8 . 


. 1 . 



where x=(x,y,l) and x'^x'y,!.) are homogeneous or per- 
spective coordinates, and the symbol ~ indicates equality up 
to scale. (Since the M matrix is invariant to scaling, there are 
only 8 independent parameters.) (It may be instructive to 
note here that the vector m referred to in Equation 1 is a 
column vector consisting of the elements m 0 , m 1 , . . . m 8 of 
the matrix M of Equation 2.) Equation 2 can be re -written as 



mo* 4- mi y+mi 
m^x 4- my y + m% 

m 6 x 4- m 7 y 4- m 8 



(3) 



(4) 



To recover the parameters, the process iteratively updates 
the transformation matrix using 



M—(l4-D)M 



where 



do d\ di 
dy d 4 d$ 
rf 6 d-j d$ 



(5) 



(6) 



Resampling image I^x') with the new transformation x'~ 
(I+D)Mx is the same as warping the resampled image I(x) by 
x"~(I+D)x, (ignoring errors introduced by the double resa- 
mpling operation), i.e., 



(1 + do)x+d l y + d2 
d 6 x + d 7 y + {\ +ds) 

(d 3 )x + {\+dt)y + d s 
d 6 x + d 7 y + {l 4-d 8 ) 



(7) 



(8) 



The matrix M may be thought of as producing a large 
change in the image as suggested in FIG. 6 in which a 
relatively large right-to-left corrective movement is 
observed from I^x') to Ij(x), while the subsequent incre- 
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mental deformation D induces an additional relatively small such as Cholesky decomposition of the type described in 
right-to-left movement from ^(x) to I^x'). If this deforma- Press et al„ Numerical Recipes in C: The Art of Scientific 
tion by D provides the final correction, then the placement Computing, Cambridge University Press, Cambridge, 
of a fixed object 610 relative to the second image's England, second edition, 1992. Note that for this problem, 
re-warped coordinate system (x"^") is the about same as the 5 the matrix A is singular without elimination of one of the 
object's placement relative to the first image's coordinate ^ree parameters d 0 , d 4 , d 8 . In practice, we set d 8 =0, and 
system (x,y), as indicated in FIG. 6. therefore only solve an 8-by-8 system. Translational motion 
The transformation of the second image's coordinate is a special case of the general 8-parameter transformation 
system is not likely to shift it by an integral number of image where J is a 2-by-2 identity matrix because only the two 
pixels. More likely, each shifted pixel coordinate of the W parameters m 2 and m 5 are used. The translational motion 
second image will lie somewhere between neighboring pixel model can be used to construct cylindrical and spherical 
coordinates of the first image. FIG. 7 illustrates how a pixel panoramas if each image is warped to cylindrical or spheri- 
being sampled at a non-integral location can be computed as cal coordinates image using a known focal length. Details of 
a bilinear blend of the original pixels. In order to warp now to warp and construct cylindrical and spherical pan- 
images, the pixels of the two images are best resampied 15 oramas are well-known in the art. 

using techniques well-known in the art, such as bilinear The foregoing method may be summarized with respect to 

resampling of the type disclosed in G. Wolberg, Digital the steps of FIG. 8 and the signal flow diagram of FIG. 9 as 

Image Warping, IEEE Computer Society Press Los follows. A pair of overlapping images 905, 910 are input 

Alamitos, Calif., 1990. ( sle P 810) and a transform matrix 915 of the second image 

In order to compute an appropriate value of the deforma- 20 is initialized (step 820). An incremental deformation of the 

tion parameter d, the process minimizes the squared error second image to improve the registration between the two 

metric images is computed (block 830) as will be described below 

in more detail with reference to FIG. 9. A matrix multiplier 

V r? r*?i I 2 (9) ^ updates the transformation matrix with the deformation 

( } B h [ 1 { ,) ' ot * ,)J 25 (step 840). A matrix multiplier 920 transforms the current 

image coordinates with the transform matrix to new coor- 

\\ * 10 ^ dinates with which a warp operator 925 resamples the 
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{ ^ X >)~$J " o ( *')| second image to produce a warped image (860). 

' In order to compute the deformation, a subtractor 930 

= Yj telJjd + ef 30 subtracts the warped image from the first image to produce 

' an error. A gradient operator 935 produces the gradient with 

respect to the new coordinates of the warped image while an 
where e^OO-I^M is *e intensity or color error (one can »40 .computes the .Jacobian matrix of the new or 

use either intensity or three channels of color errors), & r =V "oidinite system with respect to the elements of the 

c , v . , . c z A/ i , v . . . 35 deformation matrix. As noted previously herein, the Jaco- 

1,00 * the image gradient of I, at x,, d=(d d 8 ) is the ^ of ^ w system fc expressed as a 

incremental motion parameter vector, and J-J/x,), where Qf ihe unwarped coordinates, and therefore in FIG. 

9 the Jacobian-computing operator 940 receives the 
unwarped coordinate or vector x as an input. One advantage 
40 of this feature is that the Jacobian operator is a function of 
the current (unwarped) coordinate system instead of the new 
coordinate system, which reduces the computational burden, 
is the Jacobian of the resampied point coordinate x" with a multiplier 945 multiplies each error by the product of the 
respect to d. (The entries in the Jacobian correspond to the corresponding gradients and Jacobians, and a summer 950 
optical flow induced by the instantaneous motion of a plane 45 accumulates the resulting products to produce a residual 
in 3D. It should be noted that while the Jacobian of Equation vector b. A multiplier 955 obtains products of the gradients 
11 is the Jacobian of the warped or resampied coordinate and Jacobians which are transposed by a transposer 960, 
system, its expression on the right-hand side of Equation 11 a fter which another multiplier 965 combines the transposed 
is entirely in terms of x or (x,y), which is the original or and untransposed gradient-Jacobian products. These com- 
unwarped coordinate system, and therefore it is a function of 50 bined products are accumulated in a summer 970 to produce 
the unwarped coordinates.) This least-squares problem has a the Hessian matrix A. A normal equation solver computes 
simple solution through the normal equations the individual elements of the deformation matrix, which is 

Adm _ b (12) then reconstructed by a reconstructor 980. 

If a stopping criteria has been reached (YES branch of 
wnere 55 block 870), then the latest versions of the warped second 

image and of the transformation matrix are stored or output. 
A^^JigigJJf (13) Otherwise (NO branch of block 870), the next iteration 

1 begins starting with step 830. A stopping criteria can assume 

a number of forms, such as a reduction below a predeter- 
is the Hessian and 60 muiecl threshold of the errors computed by the subtractor 

930, or, more preferably, a reduction below a predetermined 
14 - threshold of the average displacement. 
b = 2_j e i J >^ For the foregoing image alignment method, as well as for 

the ones to be described below, a progressive coarse-to-fine 
65 hierarchy may be employed to advantage, in which the 
is the accumulated gradient or residual. These equations can initial iterations of the process, which produce the roughest 
be solved using a symmetric positive definite (SPD) solver estimates, operate on smaller image data structures in which 
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the image data has been reduced to a coarse version of the 
image, while successive iterations operate on progressively 
finer versions of the image until the final iteration, which 
operates at a resolution corresponding to that of the original 
image. Referring to FIG. 10, the process starts at the coarsest 5 
resolution level (step 1010), the alignment process is 
executed (step 1020). If the highest resolution level has not 
yet been reached (NO branch of block 1040), then the image 
data is refined to the next highest image resolution level 
(step 1050) and the alignment process is repeated using the 10 
prior results. It is conventional in multi-level resolution 
schemes, such as that employed in FIG. 10, to modify the 
motion parameters (e.g., the transformation M) when tran- 
sitioning to the next highest resolution level in a manner 
well-known in the art. For example, if the planar perspective 15 
transformation of Equations 2-4 is employed, then for a 
resolution level increase by a factor of n (e.g., n=2), then 
certain elements of the planar perspective transform matrix 
M of Equation 2, namely m 2 and m 5 are increased 
(multiplied) by a factor of n and the elements m 6 and m 7 are 20 
decreased (divided) by a factor of n. As another example, if 
the rotational transformations of Equations 15 and 16 below 
are employed , then the focal length f from which the focal 
length scaling matrix V is constructed (in accordance with 
the definitions immediately below Equation 15) is increased 25 
(multiplied) by a factor of n. 

The 8-parameter perspective transformation recovery 
method works well provided that initial estimates of the 
correct transformation are close enough. However, since the 
motion model contains more free parameters than necessary, 30 
it suffers from slow convergence and sometimes gets stuck 
in local minima. For this reason, it is preferable to use the 
3 -parameter rotational model described next. 
Incremental 3-D Rotational Alignment: 

For a camera centered at the origin, the relationship 35 
between a 3D point p=(X,Y,Z) and its image coordinates 
x=(x,y,l) can be described by 



ping between the image pixels of 1^ and a viewing directions 
in the world (i.e., a 3D world coordinate system). 

Assume for now that the focal length f is known and is the 
same for all images, i.e, W k is the same for all images. (The 
method for computing an estimate of f from an initial set of 
nomographics is given later in this specification.) To recover 
the rotation, an incremental update to R, is performed based 
on the angular velocity ^=(0)^, coy coj: 



Rj-R^R, or M— Vft(Q)R,V l V- 1 



(17) 



where the incremental rotation matrix ft(Q) is given by 
Rodriguez's formula: 



R(n,0H+<sin 0) X(n)+(l-cos 0) X(n) 2 

with 

e=||Q|| ( n=o/e 
and where 



(18) 



X(H) = 



0 -<o t 
(o l 0 



is the cross product operator. Keeping only terms linear in Q, 
the following is obtained: 



M'-VjI+XfQJlR^^V-Xl+D^M, 



(19) 



where 



d 0 = vxtnjv- 1 = 



0 ~Ui z fCJy 

oj x 0 -foi x 

-Uiy/f <O x /f 0 



is the deformation matrix which plays the same role as D in 
equation (5) above. Computing the Jacobian of the entries in 
D a with respect to £3 and applying the chain rule, the new 
Jacobian is obtained, corresponding to the rotational com- 
ponent of instantaneous rigid flow: 



45 



dxT _ dx" dd 



-xy/f /fx 2 // -y 
-f-y 2 tf xy/f x 



(20) 



are the image plane translation, focal length scaling, and 3D 
rotation matrices. For simplicity of notation, it is assumed 
that pixels are numbered so that the origin is at the image 
center, i.e., ex c x ~c y =Q, allowing us to dispense with T. (In 
practice, mislocating the image center does not seem to 
affect mosaic registration methods very much). The 3D 
direction corresponding to a screen pixel x is given by 

/7~R- l V- l x. 

For a camera rotating around its center of projection, the 
mapping (perspective projection) between two images \ k and 
I;, is therefore given by 

M-ViR^R^V,- 1 (16) 

where each image \ k is represented by M^-V^R^ i.e., a focal 
length and a 3D rotation. The task of building image mosaics 
essentially becomes registering a sequence of images by 
recovering their transformation matrices M A . Each transfor- 
mation corresponds to one image and represents the map- 



This Jacobian is then plugged into the previous minimiza- 
tion pipeline to estimate the incremental rotation vector Q, 
50 after which R; can be updated using equation (17) above. 
The 3D rotation estimation is thus a special case of general 
8-parameter perspective transformation, and is more robust 
and efficient because this process only handles 3 unknowns 
instead of 8. 

55 The foregoing 3D rotation alignment method may be 
summarized with respect to the steps of FIG. 11 and the 
signal flow diagram of FIG. 12 as follows. A pair of 
overlapping images 1205, 1210 are input (step 1110) and a 
transform matrix 1215 of the second image is initialized 

60 (step 1120). An incremental rotation of the second image to 
improve the registration between the two images is com- 
puted (block 1130) as will be described below in more detail 
with reference to FIG. 12. A matrix multiplier 1217 updates 
the transformation matrix with the incremental rotation in 

65 accordance with Equation 16 (step 1140). A matrix multi- 
plier 1220 transforms the current image coordinates with the 
transform matrix to new coordinates with which a warp 
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operator 1225 resamples the second image to produce a 
warped image (step 1160). 

In order to compute the incremental rotation, a subtractor 
1230 subtracts the warped image from the first image to 
produce an error. A gradient operator 1235 produces the 5 
gradient with respect to the new coordinates of the warped 
image while a Jacobian operator 1240 computes the Jaco- 
bian matrix of the current coordinate system with respect to 
the elements of the incremental rotation matrix. A multiplier 
1245 multiplies each error by the product of the correspond- 10 
ing gradient and Jacobian, and a summer 1250 accumulates 
the resulting products to produce a residual vector b. A 
multiplier 1255 obtains products of the gradients and Jaco- 
bians which are transposed by a transposer 1260, after which 
another multiplier 1265 combines the transposed and 15 
untransposed gradient-Jacobian products. These combined 
products are accumulated in a summer 1270 to produce the 
Hessian matrix A. A normal equation solver 1275 computes 
the incremental rotation vector Q. An operator 1285 
employs Rodriguez's formula to construct the incremental 20 
rotation matrix from the incremental rotation vector. 

If a stopping criteria has been reached (such as a reduction 
below a predetermined threshold of the errors computed by 
the subtractor 1230) (YES branch of block 1170), then the 
latest versions of the warped second image and of the 25 
transformation matrix are stored or output. Otherwise (NO 
branch of block 1170), the next iteration begins starting with 
step 1130. 

The foregoing rotational alignment method employs 
three-dimensional rotations to achieve the desired warping 30 
effect, and typically produces the same "keystoning" effect 
illustrated in FIG, 5 as the planar perspective method of 
Equations 1-14. Therefore, the term warping process is 
achieved . Accordingly, the term "warp" as employed in this 
specification refers generally to any process employing a 35 
transformation M to produce the desired effect, whether the 
transformation is a planar perspective transformation or a 
three-dimensional rotation transformation. 

Estimating the Focal Length 

In order to apply the incremental 3D rotation method, an 
estimate of the camera's focal length must first be obtained. 
A convenient way to obtain this estimate is to deduce the 
value from one or more perspective transforms computed 
using the 8-parameter method. Expanding the VjRVq" 1 
formulation, 
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From this, estimates of the focal lengths are computed as 
follows: 



m 2 m 2 

fo = ~ J—i 2 if «o + *l - «3 - "J * 0 

m$ +■ m , - m\ - m% 



fo = — if m 0 m 3 -t- mi m 4 * 0 and m 2 * 0. 



as well as 



h ~ 2 2 lf m 7 - m 6 * 0 



/?— " 8 "'*" J " , lf»«»0 M dm,»0. 

mgm7 



If the focal length is fixed for two images, the geometric 
mean of fg and f 1 is taken as the estimated focal length. 
When multiple estimates of f are available, the median value 
is used as the final estimate. Once an initial set of f estimates 
is available, these estimates can be improved as part of the 
image registration process, using the same kind of least 
squares approach as for the rotation. 

The process of estimating the focal length is illustrated in 
FIG. 13. In step 1310, an initial perspective alignment of at 
least some of the images is performed using the alignment 
method described above to provide an initial estimate of the 
transformation matrix M of a particular image. Then, in step 
1320, initial estimates of the elements of the transformation 
matrix M are extracted. In step 1330, the focal length is 
estimated from certain elements of M in accordance with the 
equations for f stated above. 
Other Motion Parameters: 

The same general strategy can be followed to obtain the 
gradient b and Hessian A associated with any other motion 
parameters. For example, the focal length f fc can be adjusted 
by setting f*=(l+e*)4, i.e., 



M-(I+e*D UQ )M 



(21) 



where D 110 is a diagonal matrix with entries (1,1,0). The 
Jacobian matrix is thus the diagonal matrix with entries 
(x,y). In other words, the process is estimating a simple 
re -scaling (dilation). This formula can be used to re-estimate 
the focal length in a video sequence with a variable focal 
length (zoom). If it is desired to update a single global focal 
length estimate, f-(l+e)f, the update equation and Jacobian 
are more complicated. This update is performed as follows: 



where R«fr^]. 

In order to estimate foca) lengths f 0 and f ls it is observed 
that the first two rows (columns) of R must have the same 55 where 
norm and be orthogonal (even if the matrix is scaled), i.e., 
for the rows: 



M-(I+cD u ^)VR 4 Rr 1 va-eD 110 >-a+cD«)M 



D.oDuo-MDuoM" 1 



(22) 



(23) 



m 0 2 +m 1 2 +m 2 2 // 0 2 ^n 3 2 +m 4 2 +m 3 2 // 0 2 
ntc/th+m im4+m 2 m 5 // o 2 =0; 
and for the columns: 

Wo 2 +m3 2 +fn ) s 2 / l 2 =ffJi 2 +m4 2 +m7 2 /i 2 j 



(further simplifications of the second term are possible 
because of the special structure of D 110 ). The Jacobian does 

60 not have a nice simple structure, but can nevertheless be 
written as the product of J rf and dd/de, which is given by the 
entries in D tf . Note, however, that global focal length adjust- 
ment cannot be done as part of the initial sequential mosaic 
creation stage, since this method presupposes that only the 

65 newest image is being adjusted. The issue of global focal 
length estimate refinement is addressed below in this speci- 
fication. 
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The same methodology as presented above can be used to 
update any motion parameter p on which the image-to- 
image homography M(p) depends, e.g., the location of the 
optical center, the aspect ratio, etc. M is set to 



( A 



(24) 



Hence, the entries in 3d/dp can be read from the entries in 
(aM/dp)M"\ 

PATCH-BASED ALIGNMENT METHOD: 

The normal equations given in the previous section, 
together with an appropriately chosen Jacobian matrix, can 
be used to directly improve the current motion estimate by 
first computing local intensity errors and gradients, and then 15 
accumulating the entries in the residual vector b and 
Hlessian matrix A. This straightforward method suffers from 
several drawbacks: it is susceptible to local minima and 
outliers, and is also unnecessarily inefficient. An implemen- 
tation is now presented which is more robust and efficient. 20 

The computational effort required to take a single gradient 
descent step in parameter space can be divided into three 
major parts: (i) the warping (resampling) I^x') into ^(x), (ii) 
the computation of the local intensity errors e 4 - and gradients 
gi, and (iii) the accumulation of the entries in A and b. This 25 
last step can be quite burdensome, since it involves the 
computations of the monomials in J, and the formation of the 
products in A and b (equations (13) and 14)). Notice that 
equations (13) and (14) can be written as vector/matrix 
products of the Jacobian J(x t ) with the gradient-weighted 30 
intensity errors e^g,- and the local intensity gradient gig?. If 
the image is divided into little patches and make the 
approximation that J(x,)=Jy is constant within each patch 
(say by evaluating it at the patch center), the normal equa- 
tions can be written as 35 



these details may not exist or may be strongly aliased at 
coarse resolution levels). To help overcome this problem, a 
local search component is added to the registration method. 
Before doing the first gradient descent step at a given 
resolution level, the process can be modified to perform an 
independent search at each patch for the integral shift which 
will best align the I 0 and \ x images (This block-matching 
feature is the basis of most MPEG4 coding algorithms.) For 
a search range of is pixels both horizontally and vertically, 
this requires the evaluation of (2s+l) 2 different shifts. For 
this reason, the local search method is typically applied only 
at the coarsest level of the pyramid. Once the displacements 
have been estimated for each patch, they must somehow be 
integrated into the global parameter estimation process. The 
easiest way to do this is to compute a new set of patch 
Hessians Ay and patch residuals by (equations (25) and (26)) 
to encode the results of the search. As disclosed in prior 
publications (e.g., Bergen et ai., "Hierarchical model-based 
motion estimation", Second European Conference on Com- 
puter Vision 1992, pages 237-252, Santa Margherita 
Liguere, Italy, May 1992, Springe r-Verlag), for patch-based 
flow processes, Ay and by describe a local error surface 



ECu ; )=« y 7 'A / w y +2« / r ft y +c=(M r « y ,,t ) r A y (« r H/)+c 



where 



(27) 



(28) 



(25) 



and 



40 



b * Yj J j b j with bj = 2j e,gi 



(26) 



Here, A- and by are the terms that appear in patch-based 45 
optical flow methods. The new method therefore augments 
step (ii) above with the accumulation of Ay and by (only 10 
additional multiply/add operations, which could potentially 
be done using fixpoint arithmetic), and performs the com- 
putations required to evaluate J ; and accumulate A and b 50 
only once per patch. A potential disadvantage of using this 
approximation is that it might lead to poorer convergence 
(more iterations) in the parameter estimation method. In 
practice, this has not been observed to be the case with the 
small patches (8-by-8). 55 
Local Search: 

Another limitation of straightforward gradient descent is 
that it can get trapped in local minima, especially when the 
initial misregistration is more than a few pixels. A useful 
heuristic for enlarging the region of convergence is to use a 60 
hierarchical or coarse-to-fine process, where estimates from 
coarser levels of the pyramid are used to initialize the 
registration at finer levels. This is a remarkably effective 
approach, and it is preferable to use 3 or 4 pyramid levels in 
the mosaic construction method. However, it may still 65 
sometimes fail if the amount of misregistration exceeds the 
scale at which significant image details exist (i.e., because 



is the minimum energy (optimal) flow estimate. 

Two methods for computing Ay and by from the results of 
the local search will now be described. The first is to fit 
equation (27) to the discretely sampled error surface which 
was used to determine the best shift Uq. Since there are 5 free 
parameters in A,- and by (Ay is symmetric), one can simply fit 
a bivariate quadratic surface to the central E value of 
equation (27) and its 4 nearest neighbors (more points can be 
used, if desired). Note that this fit will implicitly localize the 
results of the local search to sub-pixel precision (because of 
the quadratic fit). 

A second approach is to compute Ay and b.- using the 
gradient -based approach (equations (25) and (26)), but with 
image I 1 (x) shifted by the estimated amount v^. After 
accumulating the new Hessian Ay and residual by with 
respect to the shifted image, the new gradient -based sub- 
pixel estimate can be computed as 



(29) 



Adding the result of equation 29 to the local search 
displacement, i.e., 



is equivalent to setting 



(30) 



(31) 



This second approach is preferred, since it results in Ay 
estimates which are non-negative definite (an important 
feature for ensuring that the normal equations can be solved 
stably), and since it better reflects the certainty in a local 
match. (An analysis of these two approaches can be found 
in Tian et al., "Algorithms for subpixel registration", Com- 
puter Vision, Graphics and Image Processing, Vol. 35, pages 
220-233, 1986.) 

In the preferred approach, if the local search method 
returns a best displacement estimate of u 0 , the intensity 
errors and gradients are evaluated at the shifted patch, the Ay 
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and by quantities are computed as before, and the b ; vector 
is then decremented by A y .u 0 . 
Termination Conditions: 

A processor carrying out the alignment process must 
know how long to iterate at a given resolution (coarse/fine) 5 
level before proceeding to the next resolution level (or 
terminating the alignment process). Rather than looking at 
the energy (RMS intensity error), the average displacement 
between the old and new warped image locations is mea- 
sured (in practice, the RMS displacement is measured at the 10 
image comers). This is an easier measure on which to set a 
threshold (a current implementation uses Vie of a pixel). 
Downhill Energy Descent: 

Because the method is to minimize a non-linear least 
squares problem, it is possible for an update to not actually 15 
go downhill in energy. If this occurs, the process preferably 
uses the previously computed parameter update vector, but 
halves the step size until either the energy has decreased or 
the termination condition is passed (very little movement). 
Robustification: 20 

Sometimes, the results of local patch-based alignment 
will be grossly erroneous (initial misregistration is too large, 
something has moved, image specularities, etc.). To com- 
pensate for this, the RMS errors associated with each patch 
are scanned to determine an overall (robust) RMS error for 25 
the image. Preferably, the process then down-weights the 
contributions from those patches whose RMS error signifi- 
cantly exceed the image RMS error estimate. 

The patch based alignment method may be implemented 
by modifying the inner loop 1290 of the signal flow diagram 30 
of FIG. 12 in the manner illustrated in FIG. 14. Specifically, 
the inner loop 1290 is modified, first, by postponing the 
multiplication of the Jacobian matrix with the gradients until 
after the gradients have been summed by the summer 1270 
(1270* in FIG. 14). Moreover, the Jacobian operator 1240' of 35 
FIG. 14 differs from the Jacobian operator of FIG. 12 in that 
in FIG. 14 the Jacobian operator only computes the Jacobian 
matrix once within each entire image patch, preferably 
evaluating the Jacobian matrix at the center of the patch. 
Furthermore, the summers 1250' and 1270' of FIG. 14 differ 40 
from their counterparts 1250 and 1270 in FIG. 12 in that in 
FIG. 14 these summers only sum over the pixels within the 
current image patch and then stop. For this purpose, the 
output of the summer 1270' of FIG. 14 is multiplied first by 
the patch Jacobian matrix by multiplier 1420 and then by the 45 
corresponding transpose by multiplier 1430. The resulting 
product is then summed over all patches by summer 1450 to 
produce A. Similarly, the output of summer 1250' is multi- 
plied by the patch Jacobian matrix by a multiplier 1422 and 
the resulting product is summed over all patches by summer 50 
1440 to produce b. Thus, the two summers 1440, 1450 
constitute a separate patch loop 1470 that sums on a patch- 
by-patch basis while the remainder of the inner loop oper- 
ates on all pixels within a given patch. 

The robustification feature referred to above is carried out, 55 
in one embodiment, by an error summer 1460 which adds all 
the errors for a given patch, and a divider 1470 which takes 
the reciprocal of the sum of the errors and a weighter 1480 
which computes a weight (a fractional number) based upon 
the reciprocal of the error sum. Such a weight is applied to 60 
de-weight the outputs of the patch adders 1250' and 1270' at 
de-weight multipliers 1485, 1487. 
GLOBAL ALIGNMENT BLOCK ADJUSTMENT: 

The sequential mosaic construction method described 
above in this specification does a good job of aligning each 65 
new image with the previously composited mosaic. 
Unfortunately, for long image sequences, this approach 



suffers from the problem of accumulated misregistration 
errors. This problem is particularly severe for panoramic 
mosaics, where a visible "gap* 1 (or "overlap") will exist 
between the first and last images in a sequence, even if these 
two images are the same. This problem can arise in 
cylindrical, perspective, or rotational panoramic representa- 
tions. 

The Problem of Closing the Gap: 

Even with the best methods for recovering rotations and 
focal length, when a complete panoramic sequence is 
stitched together, there will invariably be either a gap or an 
overlap (due to accumulated errors in the rotation estimates). 
One approach to this problem is to register the same image 
at both the beginning and the end of the sequence. The 
difference in the rotation matrices (actually, their quotient) 
directly tells us the amount of misregistration. This error can 
then be distributed evenly across the whole sequence by 
converting the error in rotation into a quaternion, and 
dividing the quaternion by the number of images in the 
sequence (for lack of a better guess). The process can also 
update the estimated focal length based on the amount of 
misregistration. To do this, the process first converts the 
quaternion describing the misregistration into a gap angle 
Q g . The process can then update the focal length using the 
equation 



360° -0. 



360° 



This method, however, only works with a pure panning 
sequence, a significant limitation. 

A global alignment method is now described that reduces 
accumulated error by simultaneously minimizing the mis- 
registration between all overlapping pairs of images. The 
method is an improvement over the "simultaneous bundle 
block adjustment" technique used in photogrammetry but 
has the following distinct characteristics: 

(1) Corresponding points between pairs of images are 
automatically obtained using patch-based alignment. 

(2) The objective function minimizes the difference 
between ray directions going through corresponding 
points, and uses a rotational panoramic representation. 

(3) The minimization is formulated as a constrained 
least-squares problem with hard linear constraints for 
identical focal lengths and repeated frames. (It has been 
found that it is easier to use certain frames in the 
sequence more than once during the sequential mosaic 
formation process — say at the beginning and at the 
end — and to then use the global alignment method to 
make sure that these all have the same associated 
location.) 

Establishing The Point Correspondences: 

The global alignment method is a feature-based method, 
i.e., it relies on first establishing point correspondences 
between overlapping images, rather than doing direct inten- 
sity difference minimization (as in the sequential method). 
To find the features, the method divides each image into a 
number of patches (e.g., 16-by-l 6 pixels), and uses the patch 
centers as prospective "feature*' points. For each patch 
center in image I /s its corresponding point in another image 
I A could be determined directly by the current inter-frame 
transformation M^M; -1 . However, since these alignments 
are probably not optimal, it is better instead to invoke the 
local search -based patch alignment method described pre- 
viously in this specification. (The results of this patch-based 
alignment are also used for the deghosting method discussed 
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later in this specification.) Pairs of images are examined only 
if they have significant overlap, for example, more than a 
quarter of the image size. In addition, instead of using all 
patch centers, the process selects only those with high 
confidence (or low uncertainty) measure. It is preferred to 5 
set the threshold for the minimum eigenvalue of each 2-by-2 
patch Hessian (available from the patch-based alignment 
process) so that patches with uniform texture are excluded. 
Other measures such as the ratio between two eigenvalues 
can also be used so that patches where the aperture problem 10 
exists can be ignored. Raw intensity error, however, would 
not make a useful measure for selecting feature patches 
because of potentially large inter-frame intensity variations 
(varying exposures, vignetting, etc.). 

Optimality Criteria: 15 

For a patch j in image 1^ let I ; be an image in the set N /Jt 
of overlapping images in which patch j is totally contained 
(under the current set of transformations). Let x^ be the 
center of this patch. To compute the patch alignment, the 
process uses image l k as I 0 and image I; as I a and invokes the 20 
local search method described above in this specification, 
which returns an estimated displacement u ;7 -»Uy*. The cor- 
responding point in the warped image I 1 is thus 



In image Ij, this point's coordinate is 



25 



30 



if the rotational panoramic representation is used. 

Given these point correspondences, one way to formulate 
the global alignment is to minimize the difference between 
screen coordinates of all overlapping pairs of images, 



E(\M k })= £ I 



where 



35 



(32) 



45 



is the projected screen coordinate of x >7 under the transfor- 
mation MjtM," 1 . (Each could be a general homography, 
or could be based on the rotational panoramic 
representation). This has the advantage of being able to 
incorporate local certainties in the point matches (by making 
the above norm be a matrix norm based on the local Hessian 
A /Jt ). The disadvantage, however, is that the gradients with 
respect to the motion parameters are complicated. A simpler 
formulation can be obtained by minimizing the difference 
between the ray directions of corresponding points using a 
rotational panoramic representation with unknown focal 
length. Geometrically, this is equivalent to adjusting the 
rotation and focal length for each frame so that the bundle 
of corresponding rays converge, as shown in FIG. 15. FIG. 
15 shows the adjustment of the bundle of rays x jt so that they 
converge to Xy. Let the ray direction in the final composited 
image mosaic be a unit vector py, and its corresponding ray 
direction in the kth frame be p/*-R J t" 1 V Jt " 1 Xy fc . The block 
adjustment method can be formulated to simultaneously 
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optimize over both the pose (rotation and focal length R*, f*) 
and structure (ray direction py) parameters, 

£«**,/*!. \Pj)) = £llP* -Pj\\ l -^m^jk-pjf (33) 



where 

is the ray direction going through the j'* feature point 
at x ;Jt , y /k in the k* frame, and 



(34) 



located 



(35) 



(Equation (35) absorbs the f t parameter in V k into the 
coordinate definition.) The method set forth here with ref- 
erence to Equation 33 has been described with reference to 
an implementation in which the deformation is carried out 
by three-dimensional rotations R*. However, the same 
method may be carried out with any type of deformation 
such as the planar perspective deformation of Equation 2. In 
this case, in equations such as Equation 33, the term R k is 
replaced by throughout the equation and in equations 
such as Equation 34, the focal length f* is replaced by 1. 

The advantage of the above direct minimization (equation 
(33)) is that both pose and structure can be solved indepen- 
dendy for each frame. For instance, one can solve for p 
using linear least-squares, R* using relative orientation, and 
4 using nonlinear least-squares. The disadvantage of this 
method is its slow convergence due to the highly coupled 
nature of the equations and unknowns. (Imagine a chain of 
spring-connected masses. If we pull one end sharply, and 
then set each mass to the average of its neighbors, it will take 
the process a long time to reach equilibrium. This situation 
is analogous.) For the purpose of global alignment, however, 
it is not necessary to 15 explicitly recover the ray directions. 
The block adjustment method can be reformulated to only 
minimize over pose (R*, Q for all frames I t , without 
computing the ray directions py. More specifically, the pose 
is estimated by minimizing the difference in ray directions 
between all pairs of overlapping images I*, I,, 



(36) 



50 



Z ii^- 



PjtW 2 



■■ Z w 1 ** 



In the foregoing equation, the index k refers to the image that 
is divided into patches, the index j refers to the particular 
55 patch and the index I refers to the images overlapping image 
k. Once the pose has been computed, the estimated ray 
directions py can be computed using the known correspon- 
dence from all overlapping frames N,* where the feature 
point or patch center of patch j is visible, 



n Jk ~ 



z 



(37) 



65 where n ;> is the number of overlapping images in the set of 
overlapping images N ;A where the patch j is completely 
visible. FIG. 16 illustrates the first formulation (Equation 
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33) in which the distance to the consensus direction x y is 
minimized while FIG. 17 illustrates the preferred formula- 
tion (Equation 36) in which the difference between ray 
directions of image pairs are minimized. 
Block Adjustment Solution Method: 

The least-squares problem of equation (36) can be solved 
using the gradient descent method described previously in 
this specification. To recover the pose R*, f^ the process 
iteratively updates the ray directions V jk (x jk , R*,Q to: 

P yt (jr y£ ;a+X(Q i ))lW^/*)- (38) 

The terms of Equation 38 are computed by solving the 
normal equations, whose formation is discussed below. The 
minimization problem (equation 36) can be rewritten as 



MM* 



(39) 



where 



*ju = Pjk -Pjh 



dptidfl 

and 



1 



(40) 



Pjk = X{p jk ), 



d Pjk 



= r: 



-Xjkfk 

-yjtfk 
1%-fk 1 



(41) 
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The normal equations can be formed directly from equa- 
tion (39), updating four different subblocks of A and two 40 
different subvectors of b for each patch correspondence. 
Because A is symmetric, the normal equations can be stably 
solved using a symmetric positive definite (SPD) linear 
system solver. The result is of* and Q*. The correct focal 
length is then obtained by adding Sf^+ft, and the focal length 45 
scaling matrix V is formed from f fc as defined at Equation 15 
above. However, the focal length scaling matrix is only 
required if the rotational deformation matrix R is being used 
instead of a planar perspective deformation M. The block 
adjustment method has been described above with reference 50 
to an implementation employing the rotational deformation 
R but may as well be implemented using another deforma- 
tion M such as a planar perspective deformation instead of 
R. In fact, in the foregoing equations (Equations 38—41), M 
may be substituted for R provided f k in Equation 34 is 
changed to 1, as described previously herein. M may be any 
suitable deformation, such as a planar perspective deforma- 
tion. 

Depending upon whether the focal lengths are known or 
not, there are three possible cases for this minimization: 

(1) 4N unknowns: the focal lengths are unknown and 
different for all frames; 

(2) 3N+1 unknowns: the focal lengths are the same but 
unknown; 

(3) 3N unknowns: all focal lengths are known. 
By incorporating additional constraints on the pose, the 

minimization problem (equation (36)) can be formulated as 
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a constrained least-squares problem which can be solved 
using Lagrange multipliers, as described in Press et aL, 
Numerical Recipes in C: The Art of Scientific Computing, 
Cambridge University Press, Combridge, England, 2 nd 
edition, 1992. Possible linear constraints include: 

(1) Q o =0. The pose of the first frame is unchanged. For 
example, the first frame can be chosen as the world 
coordinate system. 

(2) 5f*-0 for all N frames. All focal lengths are known. 

(3) Sf^Sfo for all k. All focal lengths are the same but 
unknown. 

(4) 8f A ofif f and Q k ^Qf, Frame j is the same as frame k. In 
order to apply this constraint, it is necessary to set f^f, 
and R^R;. 

The above minimization process converges quickly 
(several iterations) in practice. The running time for the 
iterative non-linear least -squares solver is much less than the 
time required to build the point correspondences. 

The foregoing global alignment method is illustrated in 
FIG. 18. In step 1810, patch-based alignment (FIG. 14) is 
performed to produce a rotation matrix for each image. From 
the rotation matrices, the amount of overlap of the various 
images may be determined so that all pairs of overlapping 
images are identified in step 1820. A given image is divided 
into patches (step 1830) and all other images overlapping a 
given patch are defined (step 1840). The patched image is 
thus paired with each one of the other images that overlaps 
the particular patch, thus forming a set of image pairs. The 
center of the particular patch defining one ray direction is 
matched with a corresponding pixel in each image overlap- 
ping the patch defining another ray direction, the two ray 
directions forming a pair of ray directions. The robustifica- 
tion feature described with reference to FIG. 14 may be 
employed to discount or discard unreliable patches (step 
1850). Then, the method performs simultaneous incremental 
3D rotations of all images so as to converge each pair of ray 
directions (step 1860). The resulting incremental rotations 
are used to update the rotational orientation matrix of each 
image in the set, preferably using Equation 18 (step 1870), 
Also, the focal lengths are recomputed from Equations 40 
and 41 as mentioned above to form an updated focal length 
scaling matrix. Provided that the ray divergences remain 
above a predetermined threshold (YES branch of block 
1880), the simultaneous incremental rotations of step 1860 
are repeated and the process loops to steps 1870 and 1880 
until the ray divergences fall below the threshold (NO 
branch of block 1880) at which point the latest version of the 
rotation orientation matrices are stored or output (step 
1890), It should be reiterated that the present description has 
been made in terms of a rotational deformation R but the 
block adjustment method described may use another type of 
deformation M instead of R. 

DEGHOSTING (LOCAL ALIGNMENT) METHOD: 

After the global alignment has been run, there may still be 
localized misregistrations present in the image mosaic, due 
to deviations from the idealized parallax-free camera model. 
Such deviations might include camera translation 
(especially for hand-held cameras), radial distortion, the 
mis-location of the optical center (which can be significant 
for scanned photographs or Photo CDs), and moving 
objects. To compensate for these effects, it would be desir- 
able to quantify the amount of mis-registration and to then 
locally warp each image so that the overall mosaic does not 
contain visible ghosting (double images) or blurred details. 
If the mosaic contains just a few images, the process could 
choose one image as the base, and then compute the optical 
flow between it and all other images, which could then be 
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deformed to match the base. Another possibility would be to employs an inverse mapping method which will now be 
explicitly estimate the camera motion and residual parallax, disclosed. For each pixel in the new (warped) image l k , it is 
but this would not compensate for other distortions. necessary to find the relative distance (flow) to the appro- 
However, large image mosaics are to be constructed, what is priate source pixel. This field is computed using a sparse 
needed is an approach which makes all of the images 5 data interpolation technique. The input to this process is the 
globally consistent, without a preferred base. One approach £et of negative flows -u yjt located at pixel coordinates 
might be to warp each image so that it best matches the u^-x^+u^ In accordance with a current implementation, a 
current mosaic. For small amounts of misregistration, where tent (bilinear) function is placed over each flow sample (the 
most of the visual effects are simple blurring (loss of detail), s ^ e f s currently selected to be twice the patch size). To make 
this should work fairly well. However, for large 10 mi f m l er ?°] a l °! [^aHy re P roduci1 * >° "^P 5 " ln l the J ln * C /• 
misregistrations, where ghosting is present, the local motion f° latcd surf f * >' each accumulated flow value is divided by 

. °. 11 , r the accumulated weight (plus a small amount, say 0.1, to 

estimation would likely fail. , . ... P . ^ . ... t . *. / * 

„ 1 ji_ j i_ round the transitions into regions with no motion estimates). 

Tliese problems are solved by a deghosting method which Nole ^ ^ ^ de ^ QStm method nol iye rfec ' t 

computes the flow between all pairs of images, and then (because it fc lch . based> and not p ixe l-based), it 

infers the desired local warps from these computations. 15 may be desirable to iteratively apply the deghosting method 

While m principle, any motion estimation or optical flow ( in which case thc warping ficld is incrementally updated), 

technique could be used, it is preferable to use the patch- -ph c deghosting method is illustrated in FIG. 20. As in the 

based alignment method described above in this global alignment method, the image orientation matrices 

specification, since it provides the required information and obtained by the patch-based alignment method are used to 

allows us to reason about geometric consistency. Recall that 20 determine which pairs of images overlap (step 2010). For a 

the block adjustment method described immediately above given image, its overlapping mates are each warped in 

provides an estimate p y of the true direction in space corre- accordance with the appropriate rotation matrices and over- 

sponding to the j" 1 patch center in the k" 1 image, x jk . The i a j d on me given image (step 2020). Note that since this step 

projection of this direction onto the k 1 * image is jg carried out for all images, each image of an overlapping 

25 pair will be warped to the other in separate iterations of the 

( 42 ) process, as depicted in FIGS. 21 and 22. Then, for each patch 



in the given image, a pair-wise image motion estimate is 
> ) computed which tends to improve pair-wise registration 

(step 2030). The pair- wise motion estimates are combined in 
30 a normalized average for each patch (step 2040). As illus- 
This can be converted into a motion estimate in nQ 23> overlaving ^o different images on a given 

image provides two different motion estimates (vectors) 

ujt =*)i-Xjk= 1 V {xjt -xjt)= * V up (43) whose weighted average is a vector having a direction lying 

+ 1 ihf jk n * + 1 between the directions of the vectors corresponding to the 

35 two motion estimates. The result for the entire image is 

illustrated in FIG. 24, in which an averaged motion vector is 

This formula underlies the method for deghosting and has associated with each patch. The averaged motion vector for 

the following explanation. Referring to FIG. 19, the local each patch is then used to form a reverse motion vector (step 

motion required to bring the j fA patch center in the tf h image 2050) and the reverse motion vector is employed to map 

into global registration is simply the average of the pairwise 40 source pixels in the un warped image to destination pixel 

motion estimates with all overlapping images, normalized locations (step 2060). The result of this last step is illustrated 

by the fraction x\jJ(xi jk +V). This normalization factor pre- in FIG, 25 A, in which each patch is now associated with an 

vents local motion estimates from "overshooting" in their inverse motion estimate vector pointing from a source pixel 

corrections. (Consider, for example, just two images, where in the image to a destination pixel location, 

each image warps itself to match its neighbor). Thus, the 45 Since the destination pixel locations are not regularly and 

location motion estimate can be computed for each image by densely sampled, a sparse data interpolation technique is 

simply examining its misregistration with its neighbors, employed to interpolate the sparse (per-patch) inverse 

without having to worry about what warps these other motion estimate vectors -u jk to obtain interpolated per-pixel 

neighbors might be undergoing themselves. motion estimates u^x). The per-pixel motion estimates are 

One advantage of the deghosting method of Equation 43 50 then used as a mapping from source pixels to destination 

is that it works for any alignment model or deformation pixels as illustrated in FIG. 25B. This mapping is used to 

chosen by the user. For example, in Equation 42, the resample the pixels at the reverse-mapped pixel locations 

rotational deformation or model R is employed, but this term (step 2070). In other words, the values (colors or shade) of 

drops out in the final calculation of the projection (i .e., in the the source pixels are placed at the destination pixel locations 

right hand side of Equation 42). Thus, the final result or 55 in accordance with 
formula for deghosting applies to any motion model chosen 

by the user such as a rotational warp R or planar perspective Uw=U(*+«*w)- 

warp M, a significant advantage. Therefore, this method can Preferably, a weighted sparse data interpolation technique 

be employed using planar perspective, rotational or any is employed, such as that described in Nielson, "Scattered 

suitable motion model. 60 Data Modeling", IEEE Computer Graphics and 

Inverse Mapping The Warp: Applications, Vol. 13, No. 1, pages 60-70, January, 1993. As 

Once the local motion estimates have been computed, it illustrated in FIG. 25B, the contributions of motion esti- 

is necessary to warp each image so as to reduce ghosting. mates of adjacent patches to each per-pixel motion estimate 

One possibility would be to use a forward mapping method are weighted in accordance to their proximity to the pixel, so 

to convert each image I* into a new image \ k . However, this 65 that a smooth transition between adjacent patches having 

has the disadvantage of being expensive to compute, and of different per-patch motion estimates can be observed in the 

being susceptible to tears and foldovers. Instead, the system per-pixel motion estimates of FIG. 25B. 
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FIG. 26 illustrates a signal processing representation of 
the deghosting method. A set of pair-wise motion estimators 
2610 compute motion estimates between pairs of images. A 
set of adders 2620 and dividers 2630 combine the pair-wise 
motion estimates from the motion estimators 2610 in a 5 
normalized average. Each adder 2620 sums motion esti- 
mates from those motion estimators 26 1 0 responding to 
images overlaying the image processed by the particular 
adder 2620. Each divider 2630 divides the sum computed by 
the corresponding adder 2620 by the normalizationf actor of 1Q 
Equation 43 to produce the normalized average u^> of the 
pair-wise motion estimates for each patch j in each image k. 
A sparse data interpolator 2640 interpolates the per-patch 
motion estimates to produce a motion estimate for each 
pixel, u*(x). A local warp operator 2650 resamples the image 
in accordance with l,J[x)-\Jx+u k (x)). 15 
ENVIRONMENT AND TEXTURE MAP CONSTRUC- 
TION: 

Once a complete panoramic mosaic has been constructed, 
it is necessary to convert the set of input images and 
associated transforms into one or more images which can be 20 
quickly rendered or viewed. A traditional way to do this is 
to choose either a cylindrical or spherical map. When being 
used as an environment map, such a representation is 
sometimes called a latitude-longitude projection. The color 
associated with each pixel is computed by first converting 2 $ 
the pixel address to a 3D ray, and then mapping this ray into 
each input image through the known transformation. The 
colors picked up from each image are then blended using the 
weighting function (feathering) described earlier. For 
example, one can convert the rotational panorama to spheri- 3Q 
cal panorama using the following method: 

(1) for each pixel (0,<j>) in the spherical map, compute its 
corresponding 3D position on unit sphere p=(X,Y,Z) 
where X-cos(<|))sin(e), Y«sin(0), and Z-cos((|))cos(e); 

(2) for each p, determine its mapping into each image k 
using x-T^V^p; 

(3) form a composite (blended) image from the above 
warped images. 

Unfortunately, such a map requires a specialized viewer, 
and thus cannot take advantage of any texture -mapping 
acceleration hardware (without approximating the cylinder's 40 
or sphere's shape with a polyhedron, which would introduce 
distortions into the rendering). For true full-view panoramas, 
spherical maps also introduce a distortion around each pole. 

In order to solve such problems, the process described 
here supports the use of traditional texture-mapped models, 45 
i.e., environment maps. The shape of the model and the 
embedding of each face into texture space are left up to the 
user. This choice can range from something as simple as a 
cube with six separate texture maps, to something as com- 
plicated as a subdivided dodecahedron, or even a latitude- 50 
longitude tessellated globe. (This latter representation is 
equivalent to a spherical map in the limit as the globe facets 
become infinitesimally small. The important difference is 
that even with large facets, an exact rendering can be 
obtained with regular texture-mapping methods and 55 
hardware.) This choice will depend on the characteristics of 
the rendering hardware and the desired quality (e.g., mini- 
mizing distortions or local changes in pixel size), and on 
external considerations such as the ease of painting on the 
resulting texture maps (since some embeddings may leave 60 
gaps in the texture map). 

A method for efficiently computing texture map color 
values for any geometry and choice of texture map coordi- 
nates is now described. A generalization of this method can 
be used to project a collection of images onto an arbitrary 65 
model, e.g., non-convex models which do not surround the 
viewer. 
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It is assumed that the object model is a triangulated 
surface, i.e., a collection of triangles and vertices, where 
each vertex is tagged with its 3D (X,Y,Z) coordinates and 2D 
(u,v) texture coordinates (faces may be assigned to different 
texture maps). Preferably, the model is restricted to trian- 
gular faces in order to obtain a simple, closed -form solution 
(perspective map, potentially different for each triangle) 
between texture coordinates and image coordinates. The 
output of the method is a set of colored texture maps, with 
undefined (invisible) pixels flagged (e.g., if an alpha channel 
is used, then a ct=0 for invisible pixels). FIGS. 27 and 28 
illustrate the case of a cubic 3D model and a texture map 
corresponding to a planar unfolding of the cube. FIG. 29 
illustrates the mapping between vertices of the cube of FIG. 
27 and vertices of the texture map of FIG. 28. FIG. 30 
illustrates a tessellated globe and FIG. 31 illustrates the 
texture map of the globe corresponding to a Mercator 
projection. 

The method consists of the following four steps: 

(1) Paint each triangle in (u,v) space a unique arbitrary 
pseudocolor; 

(2) for each triangle, determine its (u,v,l)-to-(X,Y,Z) 
mapping; 

(3) for each triangle, form a composite (blended) image; 

(4) paint the composite image into the final texture map 
using the pseudocolors assigned in step 1 as a stencil. 

These four steps are described in more detail below. The 
first step, namely the painting of a unique pseudocolor into 
each triangle as a color id tag, uses an auxiliary buffer the 
same size as the texture map. Every pixel within the triangle 
is assigned the pseudocolor of that triangle. An RGB image 
can be used, which means that 2 24 colors are available. (For 
ease of monitoring and debugging, each pseudocolor or face 
color "id" tag is converted into R, G, and B values by first 
un- interleaving the bits and then bit-reversing each color 
byte. This results in a color progression where the high-order 
bits in each channel vary most rapidly.) After the initial 
coloring, the pseudocolors are grown into invisible regions 
using a simple dilation operation, i.e., iteratively replacing 
invisible pixels with one of their visible neighbor pseudo- 
colors. This operation is performed in order to eliminate 
small gaps in the texture map, and to support filtering 
operations such as bilinear texture mapping and MI P map- 
ping of the type disclosed in Williams, "Pyramidal 
Parametrics", Computer Graphics, Vol. 17, No. 3, pages 
1-11, July 1983, For example, when using a six-sided cube, 
the (u,v) coordinates of each square vertex are set to be 
slightly inside the margins of the texture map. Thus, each 
texture map covers a little more region than it needs to, but 
such texture filtering and MIP mapping can be performed 
without worrying about edge effects. 

The second step computes the (u,v,l)-to-(X,Y,Z) mapping 
for each triangle T by finding the 3-by-3 matrix M r which 
satisfies u^-Mjp,. for each of the three triangle vertices i of 
the triangle T. Thus, My-UP' 3 , where U-tiiojuJuJ and 
P-PolPjPJ are formed by concatenating the u, and p. 
3-vectors. (u, is a vector locating one of the three triangle 
vertices in u,v coordinates, while p, is a vector locating one 
of the three triangle vertices in X,Y,Z space.) This mapping 
is essentially a mapping of the triangle T from 3D directions 
in space (since the cameras are all at the origin) to (u,v) 
coordinates. 

In the third step, a bounding box is computed around each 
triangle T in (u,v) space and is then enlarged slightly (by the 
same amount as the dilation in step 1). A composite image 
corresponding to the bounding box is formed by blending all 
of the input images i according to the transformation 



06/17/2004, EAST Version: 1.4.1 



5,987,164 

29 30 

x^-M^M^u for u inside the bounding box. This is a full, texture mapping, by extending the method to handle occlu- 
8-parameter perspective transformation. It is not the same as sions. To do this, the pseudocolored polyhedral model is 
the 6-parameter affine map which would be obtained by painted into each input image using a z-buffering algorithm 
simply projecting a triangle's vertices into the image, and (this is called an item buffer in ray tracing). When compos- 
then mapping these 2D image coordinates into 2D texture 5 jting the image for each face, it is preferable to determine 
space (in essence ignoring the foreshortening in the projec- which pixels match the desired pseudocolor, and set those 
tion onto the 3D model). The error in applying this naive but wnicn do not matc h to be invisible (i.e., not contributing to 
erroneous method to large texture map facets (e.g., those of t j 3e g na j com p 0 site). 
a simple unrefined cube) would be quite large. WORKING EXAMPLES' 

In the fourth step, the pseudocolor associated with each 1Q ^ results of applying the global and local alignment 

pixel inside the composited image is found. This is done by methods t0 image mosaicing are now described. These 

referring to the pseudocolor 0 the tn^e T associated with £ * number f ^ ^ 

the matrix M r used to compute Xjt-MjfcMj- a u. The compos- ... p 4 , . , . # . , * 

ited pixel value (color or intensity) £ placed into a corre- sequences. All of the experiments used the rotaUonal pan- 

sponding pixel location in a triangle in the texture map if the oramic representation with unknown focal length, 

pseudocolor of the composited pixel (stored in the auxiliary 15 GIobal Alignment: 

buffer constructed in step 1 above) matches the face color id The first example shows how misregistration errors 

tag of that triangle. The pseudocoloring/stenciling method quickly accumulate in sequential registration. FIG. 33A 

described here facilitates the assignment of appropriate shows a big gap at the end of registering a sequence of 24 

composite pixel values to pixel locations in invisible regions images (image size 300-by-384) where an initial estimate of 

of the texture map by propagating pseudocolor id tags of 20 focal length 256 is used. The double image of the right 

pixels in visible regions into nearby invisible regions, as painting on the wall signals a big misalignment. This double 

described above. image is removed, as shown in FIG. 33B, by applying the 

FIG. 32 illustrates the texture mapping method. In step global alignment method which simultaneously adjusts all 

3200, a unique pseudocolor is painted into each triangle in frame rotations and computes a new estimated focal length 

the texture model, and pseudocolors are grown into invisible 2 s of 251.8. 

regions as described above. In step 3210, a texture trans- Earlier in this specification, a "gap closing" technique was 

formation or mapping between vertices in 3D space and suggested for handling the accumulated misregistration 

texture (u,v) coordinates space is determined for each tri- error However, this technique only works well for a 

angle T. For each image, the corresponding alignment trans- seque nce of images with uniform motion steps. It also 

formation matrix is recalled in step 3220. In step 3230, the 30 requ i res that tri e sequence of images swing a great circle on 

texture transformation for a given triangle T and the image ^e viewing sphere. The global alignment method, on the 

. alignment transformation matrix are combined to directly other hand? does not make such assump tions. For example, 

map image pixels from the particular image to pixel loca- ±e ^ ohil alignment method can handle the misalignment 

tions in the texture map. In step 3240, a point in texture (doubk image on the right side of big banner and sky ii ght 

space is found for every point in a given triangle of the 3D 35 frame ^ shown in FIG 33Q of an image m0S aic which is 

model, thus defining the pixels in the corresponding triangle constructed from a sequence of 10 images taken with a 

in texture space. In step 3250, for every point in a bounding camera tilted up. FIG. 33D shows the image mosaic after 

box surrounding a given triangle in texture space, a mapping block adjustment where the visible artifacts are no longer 

to the corresponding pixel in every overlapping image is apparent, 

found, and those corresponding pixel values (colors or 40 Alignment (De-Ghosting): 

intensities) are blended to form a composited pixel value. next tw0 exam ples illustrate the use of local align- 

The blending of the corresponding pixel values employs a ment for seqU ences where the global motion model is clearly 

weighted average according to violated. The first example consists of two images taken 

with a hand-held digital camera (Kodak DC40) where some 

coM«) = ^ wfccoiorta). 45 camera translation is present. The parallax introduced by this 

* camera translation can be observed in the registered image 

(FIG. 34A) where the front object (a stop sign) causes a 
Since the pixel coordinate locations in image k are defined double image because of the misregistration This misalign- 
above as x t ~NWu , the foregoing relationship may be mc ?jf significant y reduced using the local alignment 
expressed as to 50 method as seen in FIG. 34B. However, some visual artifacts 
^ still exist because the local alignment is patch-based (e.g., 

patch size 32 is used in FIG. 34B). To overcome this 

coM«) = J] ncoMM t Mf V problem, it is preferable to repeatedly apply local alignment 

4 with a larger patch size followed by a smaller one, which has 

55 the advantage of being able to handle large motion parallax 

Each weight w fc is proportional to the proximity of the image and refine local alignment. FIG. 34C shows the result after 

pixel location x* to the center of the particular image I*. This applying local alignment three times with patch sizes of 32, 

feature down-weights pixels near edges so as to reduce edge 16 and 8. The search range has been set to be half of the 

effects in the composited image (such as edge effects arising patch size for reliable patch motion estimation. FIG. 34D 

from exposure differences between images). In step 1360, 60 shows the flow field corresponding to the left image of FIG. 

the pseudocolors of the composited pixel values are found as 34E. 

described above, and compared with the pseudocolors of the The global motion model also becomes invalid when 

texture map triangles. Each composited pixel value is placed registering two images with strong optical distortion. One 

in corresponding texture map pixel location having the same way to deal with radial distortion is to carefully calibrate the 

pseudocolor. 65 camera. Using local alignment, it is possible to register 

The method can also be used to project a collection of images with optical distortion, without using explicit camera 

images onto an arbitrary object, i.e., to do true inverse calibration (i.e., recovering lens radial distortion). (The 
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recovered deformation field is not guaranteed, however, to 
be the true radial distortion, especially when only a few 
images are being registered. The minimum norm field is 
selected it each deghosting step.) FIG. 35D shows one of 
two images taken with a Pulnix camera and a Fujinon F2.8 5 
wide angle lens. This picture shows significant radial dis- 
tortion; notice how straight lines (e.g., the door) are curved. 
The registration result is shown in FIG. 35 A. The mosaic 
after deghosting with a patch size 32 and search range 16 is 
shown in FIG. 35B. FIG. 35C shows an improved mosaic 10 
using repeated local alignment with patch sizes 32, 16, 8. 

Two additional examples of large panoramic mosaics are 
presented here. The first mosaic uses a sequence of 14 
images taken with a hand-held camera by an astronaut on the 
Space Shuttle flight deck. This sequence of images has is 
significant motion parallax and is very difficult to register. 
The accumulated error causes a very big gap between the 
first and last images as shown in FIG. 36A (notice the 
repeated "12 3" numbers, which should only appear once). 
A good quality panorama (FIG. 36B) is constructed using 20 
the block adjustment method (there is some visible ghosting, 
however, near the right pilot chair). This panorama is further 
refined with deghosting as shown in FIG, 36C. These 
panoramas were rendered by projecting the image mosaic 
onto a tessellated spherical map using the method for 25 
building texture-mapped polyhedra from panoramic image 
mosaics described above in this specification. 

The final example shows how to build a full view pan- 
oramic mosaic. Three panoramic image sequences of a 
building lobby were taken with the camera on a tripod tilted 30 
at three different angles (with 22 images for the middle 
sequence, 22 images for the upper sequence, and 10 images 
for the top sequence). The camera motion covers more than 
two thirds of the viewing sphere, including the top. After 
registering all of the images sequentially with patch-based 35 
alignment, the global and local alignment methods are 
applied to obtain the final image mosaic, shown in FIGS. 
37 A through 37D. These four views of the final image 
mosaic are equivalent to images taken with a very large 
rectilinear lens. 40 
Environment/Texture Mapping: 

FIG. 38 shows the results of mapping a panoramic mosaic 
onto a longitude-latitude tessellated globe. The white tri- 
angles at the top are the parts of the texture map not covered 
in the 3D tessellated globe model (due to triangular elements 45 
at the poles). 

APPLICATIONS AND ADVANTAGES OF THE INVEN- 
TION: 

The invention solves a number of problems encountered 
in constructing full view panoramic image mosaics from 50 
image sequences. Instead of projecting all of the images onto 
a common surface (e.g., a cylinder or a sphere), a represen- 
tation is used that associates a rotation matrix and a focal 
length with each input image. Based on this rotational 
panoramic representation, block adjustment (global 55 
alignment) and deghosting (local alignment) methods dis- 
closed herein significantly improve the quality of image 
mosaics, thereby enabling the construction of mosaics from 
images taken by hand-held cameras. When constructing an 
image mosaic from a long sequence of images, error accu- 60 
mulation problems must be dealt with. The solution is to 
simultaneously adjust all frame poses (rotations and focal 
lengths) so that the sum of registration errors between all 
matching pairs of images is minimized. Geometrically, this 
is equivalent to adjusting all ray directions of corresponding 65 
pixels in overlapping frames until they converge. Using 
corresponding "features" in neighboring frames, which are 
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obtained automatically using the patch-based alignment 
method, the minimization problem is formulated to recover 
the poses without explicitly computing the converged ray 
directions. This leads to a linearly-constrained non-linear 
least squares problem which can be solved very efficiently. 
To compensate for local misregistration caused by inad- 
equate motion models (e.g., camera translation or moving 
object) or imperfect camera projection models (e.g., lens 
distortion), the image mosaic is refined using the deghosting 
method. Each image is divided into small patches to com- 
pute the patch -based alignments. Each image is locally 
warped so that the overall mosaic does not contain visible 
ghosting. To handle large parallax or distortion, the deg- 
hosting method is initiated with a large patch size. This 
deghosting step is then repeated with smaller patches so that 
local patch motion can be estimated more accurately. The 
deghosting method can also be applied to the problem of 
extracting texture maps for general 3D objects from images. 
When constructing such texture maps by averaging a num- 
ber of views projected onto the model, even slight misreg- 
istrations can cause blurring or ghosting effects. The deg- 
hosting method solves this problem, and can inherently 
compensate for related problems such as errors in the 
estimated camera geometry and intrinsic camera models. To 
summarize, the global and local alignment methods, 
together with the efficient patch-based implementation, 
make it easy to quickly and reliably construct high-quality 
full view panoramic mosaics from arbitrary collections of 
images, without the need for special photographic equip- 
ment. By mapping the mosaic onto an arbitrary texture - 
mapped polyhedron surrounding the origin, the virtual envi- 
ronment is exploited using standard 3D graphics viewers 
and hardware without requiring special purpose players. It is 
believed that this will make panoramic photography and the 
construction of virtual environments much more interesting 
to a wide range of users, and stimulate further research and 
development in image-based rendering and the representa- 
tion of visual scenes. 

While the invention has been described in detail with 
reference to its preferred embodiments, it is understood that 
variations and modifications thereof may be made without 
departing from the true spirit and scope of the invention. 

What is claimed is: 

1. A method for performing simultaneously alignment of 
a set of overlapping images, comprising: 

for each one of said images of said set, determining ray 
directions relative to a 3-dimensional coordinate sys- 
tem at plural predetermined pixel locations in said one 
image; 

for each one of said plural pixel locations in said one 
image: 

(a) determining ray directions relative to said 
3-dimensional coordinate system of the correspond- 
ing pixel location in each one of the other images 
overlapping said one predetermined pixel location of 
said one image, and 

(b) computing incremental deformations of said over- 
lapping images which simultaneously minimize dif- 
ferences between the ray directions of plural pairs of 
said overlapping images which include said one 
image. 

2. The method of claim 1 further comprising performing 
said determining and computing steps for each of said plural 
predetermined pixel locations of said one image simulta- 
neously. 

3. The method of claim 1 further comprising warping said 
images in accordance with said incremental deformations 
and then repeating said determining and computing steps. 
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4. The method of claim 1 wherein said one image is 
divided into plural patches and said plural predetermined 
pixel locations constitute one pixel in each of said plural 
patches. 

5. The method of claim 4 wherein said one pixel in each 
of said patches comprises a center pixel of said patch. 

6. The method of claim 1 wherein said deformations are 
defined by incremental 3-dimensional rotations relative to 
said coordinate system. 

7. The method of claim 1 wherein said deformations are 
defined by planar perspective transformations. 

8. The method of claim 1 wherein said ray directions are 
defined by 3-dimensional rotations relative to said coordi- 
nate system. 

9. The method of claim 1 wherein said ray directions are 
defined by planar perspective transformations. 

10. The method of claim 1 wherein each of said ray 
directions is a ray extending from an origin of said coordi- 
nate system to the corresponding pixel location, 

11. The method of claim 1 wherein each of said ray 
directions is defined relative to at least one of (a) a pixel 
coordinate position, (b) a 3-dimensional rotational transfor- 
mation relative to said coordinate system, and (c) an image 
focal length. 

12. The method of claim 11 wherein said computing 
comprises computing at least one of an incremental rotation 
and an incremental change in focal length. 

13. The method of claim 11 wherein said computing 
comprises computing both an incremental rotation and an 
incremental change in focal length. 

14. The method of claim 1 wherein the steps of claim 1 are 
preceded by performing an initial sequential alignment of 
said images to establish a transformation of each image 
relating the respective image to said coordinate system, 

15. The method of claim 14 wherein incremental defor- 
mations comprise incremental changes of said transforma- 
tions. 

16. The method of claim 15 wherein said performing said 
initial sequential alignment comprises minimizing differ- 
ences between overlapping pixels of overlapping ones of 
said images. 

17. The method of claim 16 further comprising determin- 
ing which pairs of said images significantly overlap, and 
discarding other pairs of said images from said set of 
overlapping images. 

18. The method of claim 17 wherein other pairs of said 
images are discarded unless at least a predetermined fraction 
of the pixels thereof overlap. 

19. A computer-readable medium having computer- 
executable instructions for performing the steps recited in 
claim 1. 

20. A method for performing simultaneous alignment of a 
set of overlapping images, comprising: 

for each one of said images of said set, determining ray 
directions relative to a 3-dimensional coordinate sys- 
tem at plural pixel locations in a said one image; 

determining ray directions relative to said 3-dimensional 
coordinate system at corresponding pixel locations in 
the other images overlying said one image to form a 
bundle of rays at each of said corresponding pixel 
locations; and 

computing incremental deformations of said overlapping 
images which simultaneously minimize differences in 
ray directions in each of said bundles of ray directions. 

21. The method of claim 20 wherein the step of computing 
said incremental deformations comprises: 

defining a consensus ray direction at each of said corre- 
sponding pixel locations; and 
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minimizing differences between each of said bundle of 
rays and the corresponding consensus ray direction. 

22. The method of claim 21 wherein said consensus ray 
direction comprises a normalized sum of the ray directions 

5 in the corresponding bundle of rays. 

23. The method of claim 20 further comprising perform- 
ing said determining and computing steps for each of said 
plural predetermined pixel locations of said one image 
simultaneously. 

1Q 24. The method of claim 20 further comprising warping 
said images in accordance with said incremental deforma- 
tions and then repeating said determining and computing 
steps. 

25. The method of claim 20 wherein said one image is 
divided into plural patches and said plural predetermined 

15 pixel locations constitute one pixel in each of said plural 
patches. 

26. The method of claim 25 wherein said one pixel in each 
of said patches comprises a center pixel of said patch. 

27. The method of claim 20 wherein said deformations are 
20 defined by incremental 3-dimensional rotations relative to 

said coordinate system. 

28. The method of claim 20 wherein said deformations are 
defined by planar perspective transformations. 

29. The method of claim 20 wherein said ray directions 
25 are defined by 3-dimensional rotations relative to said coor- 
dinate system. 

30. The method of claim 20 wherein said ray directions 
are defined by planar perspective transformations. 

31. The method of claim 20 wherein each of said ray 
30 directions is a ray extending from an origin of said coordi- 
nate system to the corresponding pixel location. 

32. The method of claim 20 wherein each of said ray 
directions is defined relative to at least one of (a) a pixel 
coordinate position, (b) a 3-dimensional rotational transfor- 

35 mation relative to said coordinate system, and (c) an image 
focal length. 

33. The method of claim 32 wherein said computing 
comprises computing at least one of an incremental rotation 
and an incremental change in focal length. 

40 34. The method of claim 32 wherein said computing 
comprises computing both an incremental rotation and an 
incremental change in focal length. 

35. The method of claim 20 wherein the steps of deter- 
mining and computing are preceded by performing an initial 

45 sequential alignment of said images to establish a transfor- 
mation of each image relating the respective image to said 
coordinate system. 

36. The method of claim 35 wherein incremental defor- 
mations comprise incremental changes of said transforma- 

50 tions. 

37. The method of claim 36 wherein said performing said 
initial sequential alignment comprises minimizing differ- 
ences between overlapping pixels of overlapping ones of 
said images. 

55 38. The method of claim 37 further comprising determin- 
ing which pairs of said images significantly overlap, and 
discarding other pairs of said images from said set of 
overlapping images. 

39. The method of claim 38 wherein other pairs of said 
60 images are discarded unless at least a predetermined fraction 

of the pixels thereof overlap. 

40. A computer-readable medium having computer- 
executable instructions for performing the steps recited in 
claim 20. 

65 41. Apparatus for use in performing simultaneous align- 
ment of a set of overlapping images from which a mosaic 
image can be produced, said apparatus comprising: 
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a processor; 

memory having executable instructions stored therein; 
and, 

wherein the processor, in response to the instructions 
stored in the memory: 

for each one of said images of said set, determines ray 
directions relative to a 3-dimensional coordinate 
system at plural predetermined pixel locations in said 
one image; 

for each one of said plural pixel locations in said one 
image: 

(a) determines ray directions relative to said 
3-dimensional coordinate system of the corre- 
sponding pixel location in each one of the other 
images overlapping said one predetermined pixel 
location of said one image, and 

(b) computes incremental deformations of said over- 
lapping images which simultaneously minimize 
differences between the ray directions of plural 
pairs of said overlapping images which include 
said one image. 

42. The apparatus of claim 41 wherein said processor 
determines said ray directions and computes said incremen- 
tal deformations for each of said plural predetermined pixel 
locations of said one image simultaneously. 

43. The apparatus of claim 41 wherein said processor 
further warps said images in accordance with said incre- 
mental deformations. 

44. The apparatus of claim 41 wherein said one image is 
divided into plural patches and said plural predetermined 
pixel locations constitute one pixel in each of said plural 
patches. 

45. The apparatus of claim 44 wherein said one pixel in 
each of said patches comprises a center pixel of said patch. 

46. The apparatus of claim 41 wherein said processor 
computes said deformations as incremental 3-dimensional 
rotations relative to said coordinate system. 

47. The apparatus of claim 41 wherein processor com- 
putes said deformations as planar perspective transforma- 
tions. 

48. The apparatus of claim 41 wherein said ray directions 
are defined by 3-dimensional rotations relative to said coor- 
dinate system. 

49. The apparatus of claim 41 wherein said ray directions 
are defined by planar perspective transformations. 

50. The apparatus of claim 41 wherein each of said ray 
directions is a ray extending from an origin of said coordi- 
nate system to the corresponding pixel location. 

51. The apparatus of claim 41 wherein each of said ray 
directions is defined relative to at least one of (a) a pixel 
coordinate position, (b) a 3-dimensional rotational transfor- 
mation relative to said coordinate system, and (c) an image 
focal length. 

52. The apparatus of claim 51 wherein said processor 
computes said incremental deformation by computing at 
least one of an incremental rotation and an incremental 
change in focal length. 

53. The apparatus of claim 51 wherein said computes 
comprises computes both an incremental rotation and an 
incremental change in focal length. 

54. The apparatus of claim 41 wherein said processor, in 
further response to said instructions, performs an initial 
sequential alignment of said images to establish a transfor- 
mation of each image relates the respective image to said 
coordinate system, 

55. The apparatus of claim 54 wherein incremental defor- 
mations comprise incremental changes of said transforma- 
tions. 
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56. The apparatus of claim 55 wherein said processor 
performs said initial sequential alignment in that said pro- 
cessor minimizes differences between overlapping pixels of 
overlapping ones of said images. 
5 57. The apparatus of claim 56 wherein said processor, in 
further response to said insturctions, determines which pairs 
of said images significantly overlap, and discards other pairs 
of said images from said set of overlapping images. 

58. The apparatus of claim 57 wherein said processor, in 
10 response to said instructions, discards other pairs of said 

images unless at least a predetermined fraction of the pixels 
thereof overlap. 

59. Apparatus for use in performing simultaneous align- 
ment of a set of overlapping images from which a mosaic 

15 image can be produced, said apparatus comprising: 
a processor; 

memory having executable instructions stored therein; 
and, 

2Q wherein the processor, in response to the instructions 
stored in the memory: 

for each one of said images of said set, determines ray 
directions relative to a 3-dimensional coordinate 
system at plural pixel locations in a said one image; 
25 determines ray directions relative to said 3-dimensional 
coordinate system at corresponding pixel locations 
in the other images overlyes said one image to form 
a bundle of rays at each of said corresponding pixel 
locations; and 

30 computes incremental deformations of said overlap- 
ping images which simultaneously minimize differ- 
ences in ray directions in each of said bundles of ray 
directions. 

60. The apparatus of claim 59 wherein said processor 
35 computes said incremental deformations in that said proces- 
sor: 

defines a consensus ray direction at each of said corre- 
sponding pixel locations; and 
minimizes differences between each of said bundle of rays 
40 and the corresponding consensus ray direction. 

61. The apparatus of claim 60 wherein said consensus ray 
direction comprises a normalized sum of the ray directions 
in the corresponding bundle of rays. 

62. The apparatus of claim 59 wherein said processor, in 
45 further response to said instructions, determines said ray 

directions and computes said incremental deformations for 
each of said plural predetermined pixel locations of said one 
image simultaneously. 

63. The apparatus of claim 62 wherein said processor 
50 determines said ray directions and computes said incremen- 
tal deformations for each of said plural predetermined pixel 
locations of said one image simultaneously. 

64. The apparatus of claim 62 wherein said processor 
further warps said images in accordance with said incre- 

55 mental deformations. 

65. The apparatus of claim 59 wherein said one image is 
divided into plural patches and said plural predetermined 
pixel locations constitute one pixel in each of said plural 
patches. 

60 66. The apparatus of claim 65 wherein said one pixel in 
each of said patches comprises a center pixel of said patch. 

67. The apparatus of claim 59 wherein said processor 
computes said deformations as incremental 3-dimensional 
rotations relative to said coordinate system. 

65 68. The apparatus of claim 59 wherein processor com- 
putes said deformations as planar perspective transforma- 
tions. 
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69. The apparatus of claim 59 wherein said ray directions 
are defined by 3 -dimensional rotations relative to said coor- 
dinate system. 

70. The apparatus of claim 59 wherein said ray directions 
are defined by planar perspective transformations. 5 

71 . The apparatus of claim 59 wherein each of said ray 
directions is a ray extending from an origin of said coordi- 
nate system to the corresponding pixel location. 

72. The apparatus of claim 59 wherein each of said ray 
directions is defined relative to at least one of (a) a pixel 10 
coordinate position, (b) a 3-dimensional rotational transfor- 
mation relative to said coordinate system, and (c) an image 
focal length. 

73. The apparatus of claim 72 wherein said processor 
computes said incremental deformation by computing at 15 
least one of an incremental rotation and an incremental 
change in focal length. 

74. The apparatus of claim 72 wherein said computes 
comprises computes both an incremental rotation and an 
incremental change in focal length. 20 

75. The apparatus of claim 59 wherein said processor, in 
further response to said instructions, performs an initial 
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sequential alignment of said images to establish a transfor- 
mation of each image relates the respective image to said 
coordinate system. 

76. The apparatus of claim 75 wherein incremental defor- 
mations comprise incremental changes of said transforma- 
tions. 

77. The apparatus of claim 76 wherein said processor 
performs said initial sequential alignment in that said pro- 
cessor minimizes differences between overlapping pixels of 
overlapping ones of said images. 

78. The apparatus of claim 77 wherein said processor, in 
further response to said insturctions, determines which pairs 
of said images significantly overlap, and discards other pairs 
of said images from said set of overlapping images. 

79. The apparatus of claim 78 wherein said processor, in 
response to said instructions, discards other pairs of said 
images unless at least a predetermined fraction of the pixels 
thereof overlap. 
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